Common Security Issues
Sanitizing User Input to Prevent Cross-Site Scripting
Offensive code can easily be injected with <script> markup or in tag attributes as events, such as onLoad.
Cross-site scripting vulnerabilities are browser dependent, depending on the situations in which different browsers execute scripting markup.
Therefore, if the content created by one user is shown to other users, the content must be sanitized. There is no generic way to sanitize user input, as different applications can allow different kinds of input. Pruning (X)HTML tags out is somewhat simple, but some applications may need to allow (X)HTML content. It is therefore the responsibility of the application to sanitize the input.
Character encoding can make sanitization more difficult, as offensive tags can be encoded so that they are not recognized by a sanitizer. This can be done, for example, with HTML character entities and with variable-width encodings such as UTF-8 or various CJK encodings, by abusing multiple representations of a character. Most trivially, you could input < and > with < and >, respectively. The input could also be malformed and the sanitizer must be able to interpret it exactly as the browser would, and different browsers can interpret malformed HTML and variable-width character encodings differently.
Notice that the problem applies also to user input from a RichTextArea is transmitted as HTML from the browser to server-side and is not sanitized. As the entire purpose of the RichTextArea component is to allow input of formatted text, you can not just remove all HTML tags. Also many attributes, such as style, should pass through the sanitization.