The Developer Day | Staying Curious

TAG | utf8

Feb/10

26

Zend Framework Escaping Entities

Zend Framework has a powerful input filtering component Zend_Filter_Input. It provides an interface to define multiple filters and validators and apply them to a collection of data. By default values returned from Zend_Filter_Input are escaped for safe HTML output. An example of such functionality:

$validators = array(
    'id' => array(
        'digits',
        'allowEmpty' => true,
    ),
    'name' => array(
        'presence' => 'required',
        new Zend_Validate_StringLength(5, 255),
    ),
    'date' => array(
        'presence' => 'required',
        new Zend_Validate_Date('d/m/Y'),
    ),
);
$filters = array(
    '*'  => 'StringTrim',
);
$input = new Zend_Filter_Input($filters, $validators, $data);
if ($input->isValid()) {
    print_r($input->getEscaped());
} else {
    print_r($input->getMessages());
}

The code above validates the $data array by checking that the id key consists only of digits or is empty, that name is present and it’s min length is 5 and max length is 255 and that date is present and it’s format is d/m/Y. The StringTrim validator removes any spaces in front and at the end of all the $data values. If the $data array is valid all escaped values are outputted and if not a generated list of error messages is presented.

Recently I’ve ran into a small problem while using Zend_Input_Filter. According to the best security practices all data coming from the user should be escaped before being outputted to the screen. This also means data coming from a database. This requires to escape every single output statement in every view. In my opinion this adds unwanted verbosity, performance overhead and a possibility to easily miss that something was not escaped in one of the views. That is why if there is a possibility I would prefer to trust the database and not worry about the data incoming from it. It may not be a very good option if data in the database is most often presented in other documents than HTML. Since all values would need to be decoded which defeats the whole purpose. Though this is rarely the case.

By default Zend_Filter_Input escapes values by using the htmlentities function. Which converts all applicable characters to HTML entities. This means that for example Danish language characters Æ Ø Å would be stored as Æ Ø Å Which means that it would take 7 bytes on average to store a single letter that otherwise could be stored using 1 - 3 bytes using UTF-8 encoding support in MySQL. This could also potentially ruin collation sorting.

To overcome this issue a very similar function htmlspecialchars could be used. It escapes only a few certain characters such as > < ” without escaping all international characters. The actual problem with Zend_Filter_Input is that it does not have an escape filter that uses htmlspecialchars. To solve the issue I’ve created a copy of HtmlEntities filter which uses htmlspecialchars function instead.

The filter can then be used like this:

$input = new Zend_Filter_Input($filters, $validators, $data);
$input->setDefaultEscapeFilter(new Company_Product_Filter_HtmlSpecialChars());

Zend_Filter_Input is a great tool to ease form validation and filtering. I would be very interested to hear from you how this problem could be solved.

, , , , , , Hide

Find it!

Theme Design by devolux.org