Disconnection issue

Current setup:

  • NGINX reverse proxy
  • Tomcat 8.0.14.0
  • Vaadin 7.7.5 - Push with websockets
  • JVM 1.8.0_111-b14​

Users are reporting disconnection issues when working on the application, and we have been unable to reproduce it so far. The connections goes through an NGINX reverse proxy server to the Tomcat server where the application resides. Both /HEARTBEAT and /PUSH requests goes through when we are testing, and we are currently tearing our hair out trying to figure out what goes wrong.

Our system utilizes heartbeat to keep the connection alive, and we also has implemented a locking feature that only allows one user access to specific screens. This is where most of the issues have been experienced, and they are usually screens where a lot of data entry is happening.

The lockable routine is implemented by setting the UI to poll, a PollListener which sets a timestamp 30 seconds in the future, and a background thread that checks against this timestamp. As long as the timestamp stays greater than current time the users keeps their lock on this page. If the UI is closed the PollListener stops updating timestamp and the lock is released after timestamp is less than now. Since we are using Push all of this is obviously going over websockets.

On Nginx the following error is frequently found:

2017/02/20 16:25:03 [error] 29513#0: *3706455 upstream timed out (110: Connection timed out) while reading response header from upstream, client: #.#.#.#, server: site.com, request: "POST /HEARTBEAT/?v-uiId=26 HTTP /1.1", upstream: "http://#.#.#.#:8080/app/HEARTBEAT/?v-uiId=26", host: "site.com", referrer: "https://site.com/" On Tomcat the following error is found on some of these cases:

Feb 20, 2017 8:55:43 AM org.atmosphere.cpr.DefaultBroadcaster addAtmosphereResource
WARNING: Duplicate resource 051aaacd-7dd8-4f25-90b0-c5ec39e4405d. Could be caused by a dead connection not detected by your server. Replacing the old one with the fresh one

We are not sure if any of the above errors are related to the connection issues, but adding them just in case.

We hope someone out there has an idea what the issue could be, and if more info is needed let us know.

Attached are the actual error the users experience (ReconnectDialog), and the nginx reverse-proxy configuration. There is nothing special set up on Tomcat.

31016.txt (2.27 KB)
31017.png

Hi, I think we have this exact problem.

We use manual push.

The websocket connection is initially upgraded correctly and then dies after about 40 seconds.

If we disable websocket upgrade, the app works by falling back to long-polling but it’s really unsatisfying.

Previously we used Apache 2 as reverse proxy with same results.

So the problem, for us, is the websocket connection dying.