Improve Container API for Vaadin 7 to add better ranges/batching

Craig10 · June 27, 2012, 1:00pm

Hi all

JPAContainer seems to have a hard time working efficiently in Vaadin 6, partly because the container API doesn’t make it easy for JPAContainer to predict which ranges of data will be required and when. When I look at the call patterns I see that there’s often a “for each entity ID, for each property, do …” sort of pattern of calls. This can result in truly horrible performance - at least n+1 queries, but worse than that if there’s lazy loading going on too.

Is there any chance the container API for Vaadin 7 will be able to focus on working with ranges of entries more? Instead of:

Count number of entities matching filter
Get IDs for current view range a … b as List (primary key types)
for each ID in range:
– for each property
— ask container for entity.property

This lands up with the container doing nEntities * nProps queries, which is
horrible
. The CachingLocalEntityProvider helps a little, but it shouldn’t be
necessary
if the API gave the container the right information to start with.

It’d be really helpful if the table and container model could instead try to:

Count number of entities matching filter or allow open-ended pagination
Get entities for current view range a … b as a List
Request a Map<Entity, List> or Map<Entity,Map<String,Property>> from the container for every entity in List

At least three optimisations now become possible. First, the container knows that the display component will want to use the data in the entities in range a…b, and isn’t just being asked to get the IDs. Second, it knows which properties of these entities will actually be used, so it can ensure they’re eagerly loaded even if they’d usually be lazy. Third, it can work with efficient groups of entities, not having to fetch each entity by ID as it’s requested. The JPA provider can do a SELECT … WHERE id IN (…) or a SELECT … LIMIT … OFFSET… or whatever it wants to reduce round trips and make things lots faster than SELECT id … LIMIT n OFFSET followed by `n’ SELECT … FROM … WHERE id = 1 queries as is currently required.

The current approach is near pessimal for JPA. Please consider improving the API so a Vaadin 7 JPAContainer can be lightning fast.

(You might also want to get involved in
http://java.net/projects/jpa-spec
as they’ll be discussing fetch groups and per-query/per-property lazy/eager control for JPA 2.1 soon).

John · June 27, 2012, 4:05pm

Since JPAContainer implements the Indexed interface, it sounds to me like your describing ticket
#8028
.

It was in the Vaadin 7 alpha2 milestone but apparently has been postponed for now. I too see a lot of benefits for both data and UI components of having a way of fetching a range of items at once and hope this will find its way into Vaadin 7 shortly.

Juan6 · June 27, 2012, 10:04pm

I am going to pose a potentially controversial question. What is the architectural purpose of JPA Container? Does it really make sense to put JPA logic into a data binding layer?

I created
ExpressUI
to rapidly develop CRUD applications, using Vaadin and JPA. However, I did not feel that I needed any add-ons. ExpressUI deals with the N+1 select problem, fetch joins, caching reference entities, etc. I use some Spring DAO patterns with some fancy query APIs built on top for managing paging, sorting, etc. ExpressUI deals with performance issues effectively.

The only thing that ExpressUI doesn’t do, is support a scroll-based mechanism for paging through results. While scrolling seems kind of nifty, I am not sure it is really that usable for navigating large result sets. My priority is managing these kinds of real-world,enterprise problems. So, in ExpressUI, Vaadin tables and data sources have no awareness beyond a single page.

When I started my project, I was also intuitively wary of too much magic going on inside JPA Container. JPA and Hibernate are complex enough as it is, without adding another layer of magic. So, I have this gut feeling that maybe the Vaadin folks should focus on making the binding APIs clear and flexible enough that developers can manage JPA issues themselves. There should be a clear architectural separation between DAO logic and binding. Then again, I don’t fully understand JPA Container. The tutorial explains how to integrate with JPA Container, but it doesn’t explain what it actually does.

So, maybe I’m missing out on something that I don’t understand. I would certainly love to hear from the Vaadin folks, more from an architectural perspective.

Juan

Craig10 · June 28, 2012, 4:29am

Re whether JPAContainer is a good idea: Honestly, having dug through how it works, it isn’t that different from how you’d drive a table with JPA from the outside. There
is
one problem I think is pretty fundmental and insanely hard to fix with the widget-controls-and-owns-container model though:

Different appearances of the same entity in different JPAContainers within the same view are not synchronized. Each appearance of the entity is a different instance - not only exceedingly wasteful, but it means that changes to one aren’t reflected in the other. At best this is ugly, at worst it’s an easy path to op-lock errors and/or data loss.

The JPA spec requires an EntityManager to return the same instance of an entity each time that entity is accessed within a given unit of work. Among other things that’s so changes made in one place are visible elsewhere. Right now, JPAContainer does everything in a new unit of work - very inefficient, and it renders JPA unable to help match entities up across different uses of those entities, so - for example - right now changing a “Customer” entity in a master table won’t reflect the change in the same entity in the detail.

Fixing that is much harder than just fixing JPAContainer’s API. It’d require different JPAContainers used within a view to collaborate by sharing an entity manager and fetching related entities within the same unit of work. You can’t hold the transaction open during user think-time, so you’d need a way for containers on the view to co-ordinate work when an event came in. Hard!

If the view is aware of the data “overview” and driving the widgets, rather than the widgets driving the data, this problem becomes a lot easier.

That problem is big enough that I think JPAContainer is unsuitable for any master/detail use where entities may be updated. To be safe it requires a refresh of every other data-driven widget on the page whenever an entity is changed, and that’s horrible. It’s made even worse by the ID-oriented design; you can’t pass an entity object from the master to the detail, you have to pass an ID and fetch the entity in a new unit of work, thus absolutely guaranteeing they’ll be different and out of sync.

JPAConainer’s design defeats at every level JPA’s attempt to keep entities within persistence contexts synchronized.

Other than that major defect, it’s pretty similar how you’d do the work when driving widgets from the view, and inefficient API aside it’s perfectly fine when you’re only using one JPAContainer in a view.

All it does is:

It builds criteria queries based on filtering and sort rules.
It executes a query to find out how many items there are, total - which you’d do from the outside too if you wanted to support scroll bars / pagintors that aren’t open-ended.
It then executes queries to find the IDs of the items within a certain pagination window, fetch those items, and fetch their properties.

This is pretty much exactly how you do it when you’re driving a dynamic table from the outside, using it as a dumb component. Having the container<->table link allows the container to automatically feed the table column names and types, automatically handle sort and filter requests, etc, but it’s nothing you can’t do by manually binding a table to your own event handlers etc. Just simplified and made re-usable.

The problem is that it the API by which components request information is bad and very restrictive. Fix the API so the container knows enough and you’ll get results at least as good as you’re getting with ExpressUI as well as automatic scrollbars/paginators/etc.

If JPAContainer were asked for “the properties x, y, and z.a.c from all entities in the pagination window 20 … 40” it could do the job exactly as well as ExpressUI’s data binding components do. The problem is that it isn’t. It’s asked for “the IDs of all entities in the window 20 … 40” then “property x of entity 2” then “property y of entity 2” etc.

Craig10 · June 28, 2012, 4:32am

I think you were extremely wise to avoid JPAContainer. After digging into the issues I’ll be dropping it immediately; it fails the trial completely. As well a worst-case-performance API, it can’t be used safely more than once on a view because it breaks persistence context synchronization.

Do you ensure that view updates in ExpressUI always share the same unit of work and entities, so things remain synchronized and different references to the same entity are the same object?

Craig10 · June 28, 2012, 4:36am

Re ExpressUI, btw: Have you played with Errai? Any experiences/opinions? I know you’ll naturally favour ExpressUI; what I’m interested in is areas you found to be a problem, or ideas you think are good and want to use.

Juan6 · June 28, 2012, 9:12am

I’m not sure I completely follow all the details of your post. However, at a high level, ExpressUI takes a safe approach by aggressively synching state with the server and oftentimes reloading entities from the database.

So, changing a single field in a form, causes a round trip with the server and all form field values to get refreshed. For example, in my demo, if a user changes currency, he will immediately see the USD amount with the new currency applied. Since this logic is on the server, it is necessary to synch with the server with every request and refresh all fields in the form. This ensures that derived fields are always accurate and that validation logic can be re-applied against multiple properties, any of which may have changed.

ExpressUI
also aggressively reloads entities from the database and re-executes queries, to avoid situations where views and bound entities get out of sync. So, for example, when the user saves a form with entity edits and goes back to results, the current query is re-executed so that the results show the users edits. Re-executing the query on the database side also ensures that the query and sort criteria are applied against the newly saved entity. The newly saved entity may or may not even appear in the results, or it may appear in a difference place if a sorted field was changed.

So, the overall goal is to sync the client, the server and the database as often reasonably possible. This also ensures that the users doesn’t get stale data, modified by other users. I also have an optimistic locking strategy for dealing with concurrent modifications.

My goal was to eliminate headaches for developers writing typical intranet business CRUD applications where performance was not the highest concern. I also wanted to make sure that the developer had maximum flexibility for coding business logic, query logic, validation rules, security permissions. To achieve these goals, I had to sacrifice some performance.

However, for intranet business apps, my demo app seems pretty snappy. I’ve worked for big businesses that use Websphere on big hardware, and apps crawl with only a single user. So, I prefer to achieve performance by using lightweight technologies rather than sacrificing usability.

Errai looks worth looking at. However, I’m not sure it competes with ExpressUI. My framework is more of an end-to-end CRUD framework. I also wonder with Errai if it does anything to manage client and server states, the way Vaadin does. One of the main benefits of Vaadin over GWT is that you don’t have to worry about DTOs and transferring state back and forth with the client.

Anyway, as with any technology, it is best to understand and accept the limitations before diving in. Sometimes with Vaadin, it gives the impression that it delivers more than it actually can, especially with the add-ons. However, I wish Vaadin would mature more and become more widely adopted. I believe that the Vaadin core has the best design approach of any framework in terms of making development friendly and pleasant. If you want maximum flexibility and performance, you are probably better off using good old JSPs, JQuery and even raw SQL or maybe NoSQL.

Matti · June 28, 2012, 10:57am

Hi,

I guess I should comment on this as I have been bit involved with the project. Can’t take “credit” for JPAContainers architecture, but late last autumn I was creating an enhanced version that supports JPA2 and Criteria queries. We didn’t make any larger architectural changes then.

JPAContainer’s architecture is constrained by two nasty things. As you have mentioned in this thread, the first issue is the bit naive Container API in Vaadin. That part of Vaadin should really be redesigned from a clean desk, but I’m afraid those changes will be left out of next major version due to other (IMO less relevant) changes and enhancements. There a things that containers can do, and JPAContainer could do these better, but these tricks are often bit unstable as they rely on expectations/know-how how components commonly request data from containers.

Another big constraint is that we support session-per-request pattern. By default JPAContainer always detaches entities immediately once they are fetched from entity manager. This is mostly to support Hibernate that has issues with long lived sessions. With stateful framework like Vaadin session-per-application(aka user aka httpsession) would work better.

There is an API that can be used to leave entities attached, but I’m not sure if other parts of the JPAContaiener really take this into account (and e.g. still force refreshes to the entity although it was a “live entity”). If this is the case, this could indeed be a good place for enhancements. Would really make things smoother with JPA implementations like EclipseLink.

Without these constraints I believe we would have built JPAContainer in a different way. Also we would have had much more efforts to put on other enhancements.

Although it has its limitations, I think JPAContainer is a great tool for many applications. Still, definitely not suitable for all cases. There isn’t a software architecture that fits well for all cases. Creating simple CRUD apps is very simple and fast with JPAContainer and if you have moderate amount of users (as most “business apps” have) the performance really shouldn’t be an issue.

cheers,
matti

Craig10 · July 3, 2012, 7:47am

Thanks for your comments.

It’s a real shame the container API hasn’t been improved for Vaadin 7; hopefully a point-release will be able to add an enhanced or parallel container API without breaking BC.

Looking more into JPAContainer, there’s one thing I think it can do to really help without having to break out of the current architecture: When asked to fetch an entity’s ID, it should fetch the whole entity and then return just the ID. This would allow the entity manager to cache effectively, which it can’t currently do because JPAContainer is requesting only IDs. It’d also permit query modifier delegates to more cleverly prefetch and eagerly load data.

As for session per request: The problem isn’t that JPAContainer supports and uses session per request, it’s that it defines “request” too narrowly. It probably has to because of the deficiencies of the container API, but right now getting an ID list is one request, getting entity 1 is another request, getting entity 2 is another request, etc. This utterly destroys the entity manager’s ability to cache and to generally function how it’s designed.

Session per request should really be for the duration of a user interaction. That means it should open a session, fetch a list of IDs, fetch each entity, then close the session.

Matti · July 23, 2012, 11:15am

Hi, back from holidays…

You are definitely right on this one. This is exactly how e.g. HbnContainer works. This would save several small but fast queries. It is safe to expect that not just the identifier of an item that is needed.

We where bit busy to move to next projects last time we worked on this product, so this was never fixed. Luckily fetching by identifiers is commonly quite fast in databases. I don’t remember if there is a ticket for this enhancement in our trac. If there isn’t one, feel free to add it.

cheers,
matti

Craig10 · August 15, 2012, 2:27pm

Mati,

Will do. Thanks for the follow-up. I’ve filed the ticket as:

http://dev.vaadin.com/ticket/9328

as suggested.

As for JPAContainer, I’ve posted a simple patch on another new ticket (#9316) that makes it easier to write a reliable and simple QueryModifierDelegate - say, to control fetch behaviour.

http://dev.vaadin.com/ticket/9316

It’s trivial, so please consider it. I’m running a modified JPAContainer build with it at the moment, on AS7 and EclipseLink 2.4 in case it matters.

I’m going to try implementing a patch for 9328 too, because JPAContainer’s performance is giving me so much trouble with one entity I’m using it with that my options are down to fix it or replace it. It’s doing 5n+1 selects for each block of IDs, including re-fetching some shared entities over and over again.

Craig10 · August 16, 2012, 1:29am

Matti,

I just had an interesting idea I’d really appreciate your comments on.

Can’t a container return the entity its self as an identifier? The API contract for Container doesn’t care what the identifier type is; it doesn’t enforce consistent ID types so long as the container can accept whatever it produces. It should be perfectly happy getting whole entities back as IDs.

Table, at least, uses size(), getIdbyIndex(), firstItemId(), lastItemId(), etc rather than the simple Container API. It won’t call getItemIds(). The whole table shouldn’t get loaded.

Is this crazy? It’d simplify things a heck of a lot - and while it might increase the memory use of containers a little, it’d also get rid of the need for a caching layer and make DB access tons more efficient.

The best solution is of course to fix the container API so it gives containers much more information about what’s needed and when, completely getting rid of the horrid getItemIds() and getItem() stuff in favour of range requests. Hopefully that can happen with a revision to Vaadin 7. In the mean time, there’s clearly a lot that can be done within the limitations currently present by working with Container.Ordered and Container.Indexed and avoiding the legacy Container API as much as possible.

The only constraint I really see with returning entities as IDs is that that the user of the container may have particular expectations about the behaviour of the equals() and hashCode() operators of identifiers. There don’t seem to be any documented, though. The container its self can get the item from an entity via PersistenceUnitUtil.getIdentifier(Object entity), or via reflective access to the @Id property, so it doesn’t have to worry.

Thoughts?

I’m going to hack together a test and see how it works.

Craig10 · August 16, 2012, 2:07am

That reminds me: JPAContainer (or EntityProvider) also needs a QueryExecutionInterceptor (or an enhanced QueryModifierDelegate if you don’t mind breaking BC) to which each javax.persistence.Query is passed before execution, so the user can specify provider- and query- specific hints. Again, I’ll hack this into my copy and post a patch if I get anywhere. I need it so I can use eclipselink.batch or eclipselink.join-fetch hints to get some control over fetching.

Henri2 · September 7, 2012, 6:35am

Using a table with a container that is not Container.Indexed is horribly inefficient and has always been so - there isn’t really a way to improve that much without introducing other fundamental problems. Always use an Indexed container (not necessarily IndexedContainer) with a Table.

Container.Indexed has been extended in Vaadin 7 with a method for getting a range of item IDs (
#8028
, to be closed soon).

It is up to the container to decide what it uses as IDs. In many cases it would be problematic to use the item itself or e.g. the underlying entity as its ID, but in some cases it is ok and e.g. BeanItemContainer uses the bean itself as its ID.