TestBench Stale element refrerence

80% of the time, our TestBench tests fail randomly. It’s always a stale element reference but not always with the same test!

stale element reference: element is not attached to the page document
  (Session info: chrome=105.0.5195.52)
For documentation on this error, please visit: https://selenium.dev/exceptions/#stale_element_reference
Build info: version: '4.4.0', revision: 'e5c75ed026a'
System info: host: '6680ecc9a5bb', ip: '172.25.0.11', os.name: 'Linux', os.arch: 'amd64', os.version: '5.15.0-48-generic', java.version: '17'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [e0ed1c09865d28a911a4faaa66b55d5a, isElementDisplayed {id=066b8984-24af-418a-9bf3-74d948d02d7e}]
Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 105.0.5195.52, chrome: {chromedriverVersion: 105.0.5195.19 (b9c217c128c1..., userDataDir: /tmp/.com.google.Chrome.VV1vC0}, goog:chromeOptions: {debuggerAddress: localhost:35529}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: LINUX, proxy: Proxy(), se:cdp: ws://172.17.0.2:4444/sessio..., se:cdpVersion: 105.0.5195.52, se:vnc: ws://172.17.0.2:4444/sessio..., se:vncEnabled: true, se:vncLocalAddress: ws://172.17.0.2:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 600000, pageLoad: 600000, script: 600000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true}
Element: [org.openqa.selenium.remote.RemoteWebElement@96fb05fa -> unknown locator]
Session ID: e0ed1c09865d28a911a4faaa66b55d5a

As you can see in the logs we are using RemoteWebDriver and have a selenium/standalone-chrome running as a Docker container. Our application is also deployed as a Docker container.

Does anybody else have the same issues? The test were more stable with Vaadin before 22.

Does the problem occur as part of a navigation?

It happened to start.vaadin.com tests when they did something like wait until ”getMenu().getPreviewView().getText()” contains ”hello” and navigation (randomly) took place after getting the view but before getting the text

It happens anywhere

We lost so much time during the last few month that we think about replacing the tests with Playwright

All runs that failed are because of this

Are you running the server in production mode?

Is it always 1-2 tests or more?

Yes. It runs as a Docker container

Yes. Mostly 1

We have around 70 tests

Not a solution, but hopefully something to increase stability https://maven.apache.org/surefire/maven-failsafe-plugin/examples/rerun-failing-tests.html

Thanks, let me try

Is this related to using Vite somehow?

Also can you give a (code) example of where in a test it has failed? For some reason the dom is recreated I would guess. Not sure why that would happen except with Vite dev mode when it finds a new frontend dependency

But it runs in production mode

Yet there is probably something that recreates the dom and causes a stale ref

Can you share your docker container config you are using run the app and browser? I remember some tweaks with memory I have done in the past

There is no special config. What do have in mind