Theory of Operation¶
Zend\Escaper provides methods for escaping output data, dependent on the context in which the data will be used.
Each method is based on peer-reviewed rules and is in compliance with the current OWASP recommendations.
The contexts in which
and URL/URI contexts.
Every escaper method will take the data to be escaped, make sure it is utf-8 encoded data, or try to convert it to utf-8, do the context-based escaping, encode the escaped data back to it’s original encoding and return the data to the caller.
The actual escaping of the data differs between each method, they all have their own set of rules according to which the escaping is done. An example will allow us to clearly demonstrate the difference, and how the same characters are being escaped differently between contexts:
1 2 3 4 5 6 7 8 9 10 11 12
$escaper = new Zend\Escaper\Escaper('utf-8'); // <script>alert("zf2")</script> echo $escaper->escapeHtml('<script>alert("zf2")</script>'); // <script>alert("zf2")</script> echo $escaper->escapeHtmlAttr('<script>alert("zf2")</script>'); // \x3Cscript\x3Ealert\x28\x22zf2\x22\x29\x3C\x2Fscript\x3E echo $escaper->escapeJs('<script>alert("zf2")</script>'); // \3C script\3E alert\28 \22 zf2\22 \29 \3C \2F script\3E echo $escaper->escapeCss('<script>alert("zf2")</script>'); // %3Cscript%3Ealert%28%22zf2%22%29%3C%2Fscript%3E echo $escaper->escapeUrl('<script>alert("zf2")</script>');
More detailed examples will be given in later chapters.
The Problem with Inconsistent Functionality¶
At present, programmers orient towards the following PHP functions for each common HTML context:
- HTML Body: htmlspecialchars() or htmlentities()
- HTML Attribute: htmlspecialchars() or htmlentities()
- CSS: n/a
- URL/URI: rawurlencode() or urlencode()
In practice, these decisions appear to depend more on what PHP offers, and if it can be interpreted as offering sufficient escaping safety, than it does on what is recommended in reality to defend against XSS. While these functions can prevent some forms of XSS, they do not cover all use cases or risks and are therefore insufficient defenses.
Using htmlspecialchars() in a perfectly valid HTML5 unquoted attribute value, for example, is completely useless since the value can be terminated by a space (among other things) which is never escaped. Thus, in this instance, we have a conflict between a widely used HTML escaper and a modern HTML specification, with no specific function available to cover this use case. While it’s tempting to blame users, or the HTML specification authors, escaping just needs to deal with whatever HTML and browsers allow.
Inconsistencies with valid HTML, insecure default parameters, lack of character encoding awareness, and misrepresentations of what functions are capable of by some programmers - these all make escaping in PHP an unnecessarily convoluted quest.
To circumvent the lack of escaping methods in PHP,
Zend\Escaper addresses the need to apply context-specific
escaping in web applications. It implements methods that specifically target XSS and offers programmers a tool to
secure their applications without misusing other inadequate methods, or using, most likely incomplete, home-grown
Why Contextual Escaping?¶
To understand why multiple standardised escaping methods are needed, here’s a couple of quick points (by no means a complete set!):
HTML escaping of unquoted HTML attribute values still allows XSS¶
This is probably the best known way to defeat htmlspecialchars() when used on attribute values since any space (or character interpreted as a space - there are a lot) lets you inject new attributes whose content can’t be neutralised by HTML escaping. The solution (where this is possible) is additional escaping as defined by the OWASP ESAPI codecs. The point here can be extended further - escaping only works if a programmer or designer know what they’re doing. In many contexts, there are additional practices and gotchas that need to be carefully monitored since escaping sometimes needs a little extra help to protect against XSS - even if that means ensuring all attribute values are properly double quoted despite this not being required for valid HTML.
DOM based XSS requires a defence using at least two levels of different escaping in many cases¶
PHP has no known anti-XSS escape functions (only those kidnapped from their original purposes)¶
A simple example, widely used, is when you see
addslashes() implementation. These were never designed to eliminate XSS yet PHP programmers use them as such.
json_encode() does not escape the ampersand or semi-colon characters by default. That means you can
lets you break out of strings, add new JS statements, close tags, etc. In other words, using
insufficient and naive. The same, arguably, could be said for
htmlspecialchars() which has its own well known
limitations that make a singular reliance on it a questionable practice.