I present a progress bar (I’ve tried both to embed it in the layout and through a dialog, and for whatever reason when my thread is done processing the access method will not consistently update the view. It just stays stuck on the progress bar. There are no error messages. What’s even more challenging is that if I click anywhere on the screen then it causes everything to suddenly update, as if everything is queued and waiting to be updated. This also seems to only happen in the production system (reverse proxy through apache) and works 100% when running locally.
This also seems to only be happening since upgrading from 24.5.5 but when I reverted to that version it continued to persist which was even more confusing.
Any assistance in trying to figure out what is happening here would be greatly appreciated.
Can you please post a code snippet? Usually it is better to get an UI instance and provide it to the background thread instead of using UI.getCurrent()
I should’ve mentioned I did do that already. The reference to the UI is created before the thread is started. As for snippets it’s very hard because it’s quite complex code with processing, conditions, etc. and a bunch of ui.access(() -> ...) type of calls. In some cases I want to update as progress is being made and so on. It seems to work everywhere else but this one part of the code, and of course that’s the most complex and involved piece of code in the whole system. Murphy’s Law.
That being said do you, or anyone else, know what could cause a call to access to stop working intermittently? Is there anything specific I should be looking for in the code? Has anything changed? I didn’t see anything in the release notes. But again, is there anything that can cause that type of inconsistent behavior?
This is the weirdest thing in software … So if I read this right: there is a missed update for the progress bar? The server-client communication is likely websockets. And guessing that based on the sudden update and message is indeed missed and there is full re-sync because of that.
I saw some updates to communication in a recent Vaadin versions (I can’t find it now, but I look again) to make the communication more robust: it should be resending the previous message to avoid the full re-sync.
So, lot of question marks, but potentially, if everything the above matches, this could be an edge case and/or regression in that code.
It’s more removing the progress bar since I’m using an indeterminate progress bar. It’s more that nothing gets updated in some cases, unless I click on a GUI component (TextField, etc.) and then everything that has been queued just gets updated. Otherwise the progress bar just keeps going (keeping in mind it’s indeterminate progress bar). I moved it to a dialog because you can also click on the background as a way to force an update, which is much more likely to happen by a user wondering what’s taking so long.
The code is a thread that does some processing. As it’s processing it also does checks as it’s proceeding. If something fails then return is the easy one. But it’s also doing other things like updating some values and so on, getting values, and so on. Then when it’s all done the progress bar should go away and be replaced by the final result. In more detail it’s running a report and as it’s progressing it’s updating the user with some GUI updates (not through the progress bar) and when done you get the final report if that makes sense. Again it works most of the time but some times it gets stuck. The only way to “unstuck” it is to click on a GUI component and then everything is immediately updated. This is definitely new, the code used to work before.
And yes like you said, a lot of question marks. And very hard to debug. Not great from the users side as they think the report is stuck when it happens. Thankfully with a dialog you can just click on the background (in modal mode) and that’s enough to trigger an update of the GUI. Not great but it’s the only way I could workaround the issue for now that would help the user force an update. If the progress bar was embedded in the UI then they wouldn’t know and would have to click on a field which is a big ask versus just clicking anywhere when moving the progress bar to a dialog window. It’s so weird. And again very inconsistent. Sometimes it happens a lot and then other times I can go many runs without any issues.
If you ever remember the details of that update in the release please let me know. I’ll also try to find it myself. I hadn’t seen anything but I could also have easily missed it.
Hmm. This would match the behavior when the websocket connection is lost. The user interaction uses HTTPS request which will reset also the ws communication and everything starts working again. And since it works in development, it makes me think could there could also be a silent websocket timeout in some of the production proxies (or actually anywhere between the user and server).
This is my own hammer to this nail, a workaround only, but since the UX is most important: You could use Idle add-on or similar to get server notified when user becomes active again (or inactive). Based on the symptoms that should trigger the UI re-sync without user doing anything else.
I just realized something I forgot digging deeper into the code. Do you know if any calls to the UI to get a value within the thread (outside of an access() call) could cause an issue like this? Normally the values are read before any threads are executed just to be safe but in this case because of the complexities the components are sent in the thread to the report class and the specific child report instances each figure out what they need to read and so on. There’s some very compelling reasons to do it this way, and it’s never been an issue before, but now I’m wondering if maybe this could be causing the intermittent issue…
Adjusting the code is possible but it requires a good amount of effort. Therefore if you happen to know if you can or cannot safely read a GUI component in a thread that would be appreciated. I don’t remember seeing anything specific about this, it’s mostly about pushing rather than pulling the data from components.
UI.getCurrent() and VaadinSession.getCurrent() should return null, when called outside request thread.
I am usually advising against populating thread locals, instead use class field in view component to store UI reference and create thread aware update method in the view.
Something like this
And I am using utility method that takes care of error cases and logging accordingly
This allows me to isolate thread use in my Presenter and be UI agnostic.
Usually accessing any UI components outside (read or write) access is discouraged as any updates can cause them to missed. However, Java as usual. If they are your own components that you pass around and you know that they are thread-safe (have no side effects) it should not be problem reading data out from them.
This also seems to only happen in the production system (reverse proxy through apache) and works 100% when running locally.
I think the key observations are a) happens only in production and b) UI is ok after the next user event. That is why I don’t think it is related to the access-method. You should see those more consistently and in development.
Overall there should be an exception, but I’ve also seen happen on TCP level network issues when connection simply stalls, and we are back to system level timeout error (which is very long).
But now I noticed also that the DefaultErrorHandler ignores all the SocketException unless in debug mode.
Just to avoid confusion, the ui in ui.access() is referenced from outside the thread. Access to the components are done in a thread safe manner. That is to say the thread only reads from some components and writes to others. No component has both read and write.
The DefaultErrorHandler comment is interesting. Not that it would resolve the issue but it may help in debugging.
That being said I’m going to create a test app that tries to read a GUI component within a thread and then calls another GUI component through the ui.access() method and see if I can’t replicate it. The code won’t read and write to the same component, it’s more to see if reading can cause some kind of blocking on the write/push. I will follow up with what I find as soon as I have the chance.
Have you enabled server push for your app? That everything gets updated when you click on the screen could indicate that push is not enabled or not working (perhaps related to the reverse proxy)
Perhaps you’re calling UI.getCurrent() from a background thread: then UI.getCurrent() returns null and UI.getCurrent().access() throws a NullPointerException and perhaps the exception is not logged anywhere.
On a related note I’ve finally had the time to create some test code and I’m able to replicate the behavior, including locally. And consistently. I’m just in the process of cleaning up the code and once it’s done (isolate it further) I’ll submit a bug report as well as post it here.
That code works great locally, including a local deploy. But as soon as I put the app behind apache the push updates fail. However if I type anything in the TextField and then click anywhere outside an update is forced and everything updates all at once. Until I do this the page sits idle and no push updates work.
The production code is more involved, and I’ve tried to trim it down as much as possible. In productions the updates happen most of the time, in the GitHub sample code the push never works. For some reason that code is able to replicate the issue 100% of the time. Like I said I tried to trim it down to as little code as possible, which eventually got me to replicate it 100% of the time.
The only way to update the website is to use a GUI component. If it’s a ComboBox rather than a TextField then I just need to click anywhere in the ComboBox, I don’t even need to select anything, just click on the GUI component.
For my test both the application and Apache are running on the same server. I’ve tried to simplify things to isolate them as much as possible, so I’m avoid any connection issues between servers. I’ve also removed all non-essential configs in the test system. The app is running through a qa subdomain.
Below are the configs for just the qa subdomain. The main subdomain (www) has almost the same with the added SSLProxyEngine on and ProxyPreserveHost on settings in the test environment. Again I’m trying to pull everything out to keep it as simple as possible, so the test/qa system is very minimized.
<VirtualHost *:443>
# Include SSL certs, SSLEngine On, error log file, etc.
Include common-configs.conf
ProxyPreserveHost on
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
# Adjusted for the actual values rather than the placeholders.
DocumentRoot /var/www/myDocumentRoot
ServerName qa.myDomain.com
ServerAlias qa.myDomain.com
</VirtualHost>