Hung threads and deadlock when locking sessions

Hi,

We have an application comprising several portlets deployed on WebSphere Portal 8.5 developed with Vaadin 7.6.7

We are getting a huge amount of hung threads and a few deadlocks when VaadinService attempts to lock the Session.

It seems to be related to https://github.com/vaadin/framework/issues/5558 but according to the version that specific issue should already be solved on the version we´re using.

I´d appreciate it if someone has an idea of what could be happening or if it´s related to the application server or container, what we could check on it.

Thank you.


The stacktrace is

[8/16/17 11:23:22:338 COT]
000000f5 SystemOut O 11:23:22.338 [Deferrable Alarm : 1]
WARN c.i.w.r.component.ThreadMonitorImpl - WSVR0605W: Thread “WebContainer : 48” (0000496e) has been active for 736,513 milliseconds and may be hung. There is/are 25 thread(s) in total in the server that may be hung.
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:845)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:878)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1208)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:225)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:301)

at com.vaadin.server.VaadinService.lockSession(VaadinService.java:626)
at com.vaadin.server.VaadinService.findOrCreateVaadinSession(VaadinService.java:662)
at com.vaadin.server.VaadinService.findVaadinSession(VaadinService.java:527)
at com.vaadin.server.VaadinService.handleRequest(VaadinService.java:1403)
at com.vaadin.server.VaadinPortlet.handleRequest(VaadinPortlet.java:544)

at com.vaadin.server.VaadinPortlet.doDispatch(VaadinPortlet.java:614)
at javax.portlet.GenericPortlet.render(GenericPortlet.java:222)


And the information about the deadlock from a javacore is:

1LKDEADLOCK Deadlock detected !!!
NULL ---------------------
NULL
2LKDEADLOCKTHR Thread “WebContainer : 268” (0x000000000902C800)
3LKDEADLOCKWTR is waiting for:
4LKDEADLOCKOBJ java/util/concurrent/locks/ReentrantLock$NonfairSync@0x00000007EB0A99C8
3LKDEADLOCKOWN which is owned by:
2LKDEADLOCKTHR Thread “WebContainer : 274” (0x000000000904BC00)
3LKDEADLOCKWTR which is waiting for:
4LKDEADLOCKMON sys_mon_t:0x00007FB974A425B0 infl_mon_t: 0x00007FB974A42628:
4LKDEADLOCKOBJ com/ibm/ws/webcontainer/httpsession/DRSSessionData@0x00000007EB09D3F8
3LKDEADLOCKOWN which is owned by:
2LKDEADLOCKTHR Thread “WebContainer : 268” (0x000000000902C800)

3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at sun/misc/Unsafe.park(Native Method)
4XESTACKTRACE at java/util/concurrent/locks/LockSupport.park(LockSupport.java:197(Compiled Code))
4XESTACKTRACE at java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:845(Compiled Code))
4XESTACKTRACE at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:878(Compiled Code))
4XESTACKTRACE at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1208(Compiled Code))
4XESTACKTRACE at java/util/concurrent/locks/ReentrantLock$NonfairSync.lock(ReentrantLock.java:225(Compiled Code))

4XESTACKTRACE at java/util/concurrent/locks/ReentrantLock.lock(ReentrantLock.java:301(Compiled Code))
4XESTACKTRACE at com/vaadin/server/VaadinSession.writeObject(VaadinSession.java:1432)
4XESTACKTRACE at sun/reflect/GeneratedMethodAccessor342.invoke(Bytecode PC:40)
4XESTACKTRACE at sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55(Compiled Code))
4XESTACKTRACE at java/lang/reflect/Method.invoke(Method.java:618(Compiled Code))
4XESTACKTRACE at java/io/ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1060(Compiled Code))

4XESTACKTRACE at java/io/ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1515(Compiled Code))
4XESTACKTRACE at java/io/ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1451(Compiled Code))
4XESTACKTRACE at java/io/ObjectOutputStream.writeObject0(ObjectOutputStream.java:1197(Compiled Code))
4XESTACKTRACE at java/io/ObjectOutputStream.writeObject(ObjectOutputStream.java:364(Compiled Code))
4XESTACKTRACE at com/ibm/ws/session/store/mtm/MTMBuffWrapper$1.run(MTMBuffWrapper.java:210(Compiled Code))
4XESTACKTRACE at com/ibm/ws/session/store/mtm/MTMBuffWrapper$1.run(MTMBuffWrapper.java:204(Compiled Code))
4XESTACKTRACE at java/security/AccessController.doPrivileged(AccessController.java:330(Compiled Code))
4XESTACKTRACE at com/ibm/ws/session/store/mtm/MTMBuffWrapper.getBytes(MTMBuffWrapper.java:204(Compiled Code))
4XESTACKTRACE at com/ibm/ws/session/store/mtm/MTMBuffWrapper.storeObject(MTMBuffWrapper.java:126(Compiled Code))
4XESTACKTRACE at com/ibm/ws/session/store/mtm/MTMHashMap.handlePropertyHits(MTMHashMap.java:232)
4XESTACKTRACE at com/ibm/ws/session/store/mtm/MTMHashMap.persistSession(MTMHashMap.java:329)
4XESTACKTRACE at com/ibm/ws/session/store/common/BackedHashMap.updateSession(BackedHashMap.java:469)
4XESTACKTRACE at com/ibm/ws/session/store/common/BackedHashMap.put(BackedHashMap.java:543)
4XESTACKTRACE at com/ibm/ws/session/store/common/BackedSession.flush(BackedSession.java:239)
5XESTACKTRACE (entered lock: com/ibm/ws/webcontainer/httpsession/DRSSessionData@0x00000007EB09D3F8, entry count: 3)

I’ve seen case of this before. It is an issue in IBM WebSphere. In some scenarios WebSphere may attempt to serialize Vaadin Session when VaadinService.lockSession finds VaadinSession lock in http session. That ends up in lock race condition. The way to fix this issue is at IBM’s hands. They need to defer serialization until http request completed in http session. I know that this issue was reported to IBM with version 8.5.5.3, but I do not know whether there has been any progress, since those reports are not public.

Thanks a lot, Tatu.

We are working with IBM Support on this issue, do you know if there´s any way to point them to this reported issue?

No, unfortunatelly I do not have further pointers about this, like their internal issue number or equivalent.

Hello, we’re facing the same problem with a Portal 8.0 environment running vaadin 7.x (most recent version).

In the stack trace attached we see that “Memory To Memory” (MTM) replication is enabled on the other customer scenario and as you say session serialization kicks in.

In our case we’ve a single node scenario without Memory to Memory replication and we’re facing the same deadlock issue.

I’m adding below a stack trace from our environment (where no serialization happens)


at sun/misc/Unsafe.park(Native Method)
at java/util/concurrent/locks/LockSupport.park(LockSupport.java:182(Compiled Code))
at java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:822(Compiled Code))
at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:853(Compiled Code))
at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1189(Compiled Code))
at java/util/concurrent/locks/ReentrantLock$NonfairSync.lock(ReentrantLock.java:197(Compiled Code))
at java/util/concurrent/locks/ReentrantLock.lock(ReentrantLock.java:273(Compiled Code))
at com/vaadin/server/VaadinService.lockSession(VaadinService.java:626(Compiled Code))
at com/vaadin/server/VaadinService.findOrCreateVaadinSession(VaadinService.java:662(Compiled Code))
at com/vaadin/server/VaadinService.findVaadinSession(VaadinService.java:527(Compiled Code))
at com/vaadin/server/VaadinService.handleRequest(VaadinService.java:1403(Compiled Code))
at com/vaadin/server/VaadinPortlet.handleRequest(VaadinPortlet.java:527(Compiled Code))
at com/vaadin/server/VaadinPortlet.doDispatch(VaadinPortlet.java:597)
at javax/portlet/GenericPortlet.render(GenericPortlet.java:222)
at com/ibm/ws/portletcontainer/invoker/impl/PortletFilterChainImpl.doFilter(PortletFilterChainImpl.java:128(Compiled Code))
at com/ibm/wps/engine/el/init/AttributeCopyFilter.doFilter(AttributeCopyFilter.java:158)
at com/ibm/ws/portletcontainer/invoker/impl/PortletFilterChainImpl.doFilter(PortletFilterChainImpl.java:120(Compiled Code))
at com/ibm/wps/resourceaggregator/capabilities/filter/PortletCapabilityDependencyFilter.doFilter(PortletCapabilityDependencyFilter.java:279)
at com/ibm/ws/portletcontainer/invoker/impl/PortletFilterChainImpl.doFilter(PortletFilterChainImpl.java:120(Compiled Code))
at com/ibm/wps/resolver/iwidget/filter/IWidgetPortletFilter.doFilter(IWidgetPortletFilter.java:60)
at com/ibm/ws/portletcontainer/invoker/impl/PortletFilterChainImpl.doFilter(PortletFilterChainImpl.java:120(Compiled Code))
at com/ibm/wps/propertybroker/standard/filter/C2APortletFilter.doFilter(C2APortletFilter.java:193)
at com/ibm/ws/portletcontainer/invoker/impl/PortletFilterChainImpl.doFilter(PortletFilterChainImpl.java:120(Compiled Code))
at com/ibm/wps/pe/pc/waspc/plm/GlobalPortletLoadMonitoringFilter.doFilter(GlobalPortletLoadMonitoringFilter.java:146)
at com/ibm/ws/portletcontainer/invoker/impl/PortletFilterChainImpl.doFilter(PortletFilterChainImpl.java:120(Compiled Code))
at com/ibm/wps/pe/pc/waspc/filter/impl/GlobalPortletFilter.doFilter(GlobalPortletFilter.java:154)
at com/ibm/ws/portletcontainer/invoker/impl/PortletFilterChainImpl.doFilter(PortletFilterChainImpl.java:120(Compiled Code))
at com/ibm/wps/pcm/scoping/filter/PCMScopingFilter.doFilter(PCMScopingFilter.java:92)
at com/ibm/ws/portletcontainer/invoker/impl/PortletFilterChainImpl.doFilter(PortletFilterChainImpl.java:120(Compiled Code))
at com/ibm/ws/portletcontainer/invoker/impl/PortletServlet.doDispatch(PortletServlet.java:573(Compiled Code))

If any idea comes to mind please share. We’re actively investigating the problem on our side as it makes no sense we face this issue.

Hi,

Just to pile on. We are seeing the same in Weblogic 12c server:

        "Thread-103806" #104165 daemon prio=5 os_prio=0 tid=0x00007fff9a1d3800 nid=0x658d waiting on condition [0x00007fff87bb8000]

      
           java.lang.Thread.State: WAITING (parking)
      
            at sun.misc.Unsafe.park(Native Method)
      
            - parking to wait for  <0x00000006d969f5c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      
            at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      
            at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      
            at com.vaadin.server.VaadinSession.lock(VaadinSession.java:968)

Hello to everybody facing this problem. After a deep dive into Vaadin’s code and debugging the problem in our scenario we found the issue.

Here’s the explanation. I will get in touch with Vaadin’s team to have the fix included in a future release.

Using the 8.6 tag of Vaadin framework I’m going to explain the issue:

At this line: https://github.com/vaadin/framework/blob/8.5.2/server/src/main/java/com/vaadin/server/VaadinPortletService.java#L294
you find the getServiceName() method.

This method uses getPortletName() as a way to idenfity a unique key for the protlet service.

The problem we all are facing appears when the same portlet is placed in a page multiple times or in different pages that the user is accessing concurrently (opening multiple tabs at the same time with portal pages where the portlet is loaded).

This is the problem for how vaadin usese getServiceName()

Vaadin uses the portlet session to store the “lock” object used to manage concurrency. Here’s were the problem is born. The same portlet used multiple instances shares the same lock object and thus we all get “locks”.

We fixed this by just replacing getPortletName() with getWindowId() in the getServiceName() method. Doing this we get an indipendent lock object for each portlet instance.

As said I will contact vaadin to have this included in future releases.

We recently made a fix in Vaadin 8.9 which could remedy the problem discussed oin this thread, see more at https://github.com/vaadin/framework/pull/11792 so if you upgrade to it or newer and test again we would be interested to know if it works now.