You can put raw XHTML content in many components, such as the
Label
and CustomLayout
, as
well as in tooltips and notifications. In such cases, you should make sure
that if the content has any possibility to come from user input, the input
is well sanitized before displaying it. Otherwise, a malicious user can
easily make a cross-site scripting attack by injecting offensive
JavaScript code in such components.
Offensive code can easily be injected with
<script>
markup or in tag attributes as events,
such as onLoad
. Cross-site scripting
vulnerabilities are browser dependent, depending on the situations in
which different browsers execute scripting markup.
There is no generic way to sanitize user input as different applications can allow different kinds of input. Pruning (X)HTML tags out is somewhat simple, but some applications may need to allow (X)HTML. It is therefore the responsibility of the application to sanitize the input.
Character encoding can make sanitization more difficult, as offensive tags
can be encoded so that they are not recognized by a sanitizer. This can be
done, for example, with HTML character entities and with variable-width
encodings such as UTF-8 or various CJK encodings, by abusing multiple
representations of a character. Most trivially, you could input
<
and >
with
<
and >
,
respectively. The input could also be malformed and the sanitizer must be
able to interpret it exactly as the browser would, and different browsers
can interpret malformed HTML and variable-width character encodings
differently.
Notice that the problem applies also to user input from a
RichTextArea
is transmitted as XHTML from the
browser to server-side and is not sanitized. As the entire purpose of the
RichTextArea
component is to allow input of
formatted text, you can not just remove all HTML tags. Also many
attributes, such as style
, should pass through the
sanitization.