LoginForm problem with german characters

Hi,

I am facing a problem with the LoginForm. If a user insert in the userName or Password german characters like ä, ö, ü they are not properly encoded.

In particular as a test inserting as username and password äöü:


final LoginForm loginForm = new LoginForm ();
loginForm.addListener (new LoginForm.LoginListener ()
{
        public void onLogin (LoginForm.LoginEvent loginEvent)
        {
                    String userName = loginEvent.getLoginParameter ("username");
                    String password = loginEvent.getLoginParameter ("password");
        }
}

The string result for userName and password is: äöü

Do I miss something? I have to define the UTF-8 on the LoginForm in some way? I am using Vaadin 6.7.5 and Firefox.

Any feedback would be appreciated,
Christian

This sounds like a bug, probably at the level of ParameterHandler. Please make a ticket.

You should be able to make the conversion as follows (not sure if this is the proper or best way to make the conversion):

public void onLogin(LoginEvent event) {
    try {
        String username = new String(event.getLoginParameter("username").getBytes("ISO-8859-1"), "UTF-8");
        String password = new String(event.getLoginParameter("password").getBytes("ISO-8859-1"), "UTF-8");
        getWindow().showNotification("Logged in " +
            username + " with password " + password);
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
}

Thanks Marko for the quick reply.

I created the ticket: http://dev.vaadin.com/ticket/8394

Your workaround works perfectly so it is not a blocking issue for me anymore.

Great, thanks.

I’m not sure if there are some fundamental issues with having UTF-8 in the LoginForm. The browser might or might not encode the input in UTF-8, but the encoding is apparently not told to the server in the request - at least Tomcat assumes that any POST parameters are in ISO-8859-1. The issue is somewhat complex as described
Character Encoding Issues
in Tomcat Wiki.

So, the issue seems to depend a) on the server defaults, b) possibly on the browser.

This sort of issues are probably the reason why most web services have a separate username and display name. Usernames are usually expected to be alphanumeric identifiers.

(Update: LoginForm uses POST, not GET (uhh where did I get that idea).)

This article in the Tomcat Wiki is quite interesting.

What it seems to me missing is that the POST on the loginHandler that is made misses the Content-Type header that specifies UTF-8.

"Most web browsers today do not specify the character set of a request, even when it is something other than ISO-8859-1. This seems to be in violation of the HTTP specification. Most web browsers appear to send a request body using the encoding of the page used to generate the POST (for instance, the element came from a page with a specific encoding… it is that encoding which is used to submit the POST data for that form). "

I tried to play around with the getLoginHTML() method adding an accept-charset=‘utf-8’ to the Form or specifying a to the page to try to force the browser to send a content-type=utf-8 but it didn’t work.

Maybe you could change the LoginForm.LoginListener servlet to add a response.setContentType(“text/html; charset=UTF-8”) or response.setCharacterEncoding(“UTF-8”).
Will it make sense?

It wouldn’t make sense to return the desired character encoding in the login listener if that’s the place where we want to have the desired character encoding in the first place. Besides, the encoding is already set correctly as UTF-8 in the response header for the login page.

The problem isn’t that the browser sends the input in wrong encoding - it sends them properly in UTF-8 - but Tomcat assumes that it’s ISO-8859-1.

The login input is received in a ParameterHandler so this issue exists with any ParameterHandler. The AbstractApplicationServlet which calls the handlers does not make any character encoding conversions, it simply gets the POST parameters raw from HttpServletRequest. Tomcat builds the request object, but as noted in the
Tomcat doc
it assumes that the data is posted as ISO-8859-1. So, it looks like, if you want to keep your code server-independent, you could use the character encoding filter in Tomcat to change the server default.