Unicode RTL and LTR character handling

Hello all,

we just had a very strange issue in a client project.
We have a form where users can enter text in a textfield and then we validate if the entered text starts either with 101 or 102.
We do this with value.startsWith(“101”) or value.startsWith(“102”)

Now this did fail for no apparent reason, we finally did track it down to the user having 3x the unicode characters E2 80 8E before the 102.

This code was working fine since vaadin 7 days and currently we are at 24.3.14.

Any idea how I can prevent entering these unicode characters or how to handle this?

I’ve not dealt with RTL / LTR problems… but normally invisible character come from people copy pasting things from other sources - like excel…

First: if you can reproduce it simply with a standalone Vaadin Text Field… it might be worth to create an issue… not sure if this is normal for LTR mode or not…

Secondly: I would suggest to enforce a proper whitelist/allowlist to your fields. If that’s not feasible apply a blacklist/disallowlist accordingly or add converter on your fields that automatically strip such invisible character for example with Normalizer - this is also useful for people copy pasting such things as tabulator, EOL or nbsp character from other sources which might corrupt your data

Looks like it’s left to right mark character. Does it appear on pasting the text or also when typing it manually?

I don’t yet know how the client was able to put them in…

Unfortunally we don’t yet know how they did input it in the textfield. But I think copy&paste