Using grid with expensive large datasets

Hello all,

We have extended AbstractBackEndDataProvider and bound to the the grid for paging through a large set of data. Some of the data properties we have are expensive to calculate and we want to filter the data on those properties. Some filters require the calculations to be completes on the data first to see if it actually falls within the requested filter.

The problem we are having is with the sizeInBackEnd Override.

For example. We have one property, “Online” that is trivial to calculate in the stream when paging through the data. The problem we have is when calling sizeInBackEnd(). Because it is called with offset = 0 and limit = Integer.MAX_VALUE, we end up calculating this across the entire dataset, which is large, this is too expensive and makes grid usable.

And when we don’t match the sizeInBackEnd count exactly with what fetchFromBackEnd is generating we get Index out of bounds exceptions.

I understand sizeInBackEnd() is asking how many records to expect but is there a way to provide it with an estimated or max value? I can cheaply calculate these values.
So, a first question to ask. Are we using AbstractBackEndDataProvider correctly? Is there a better way to stream data to the grid for large datasets? A really good example seems pretty sparse.

Help!
Thank you,

  • Jon

i use for big data CallbackDataProvider with JPA criteria builder api

Never mind everyone. The short answer to this is, you can’t. After lots of digging, I found a couple articles:
[https://vaadin.com/forum/thread/17984284/is-it-possible-to-use-lazy-loading-in-a-grid-without-the-total-item-count]
(https://vaadin.com/forum/thread/17984284/is-it-possible-to-use-lazy-loading-in-a-grid-without-the-total-item-count)

(In this article there is a link to another great article from Matti Tahvonen that goes into depth why you can’t and also has a possible workaround.)

But to be honest, saying Vaadin’s Grid can Lazily Load data is frustrating. It’s only lazy if you are lucky enough to be able to push getting the count of the entire dataset off to a another, very responsive service. (Say a database) In our situation, we were not so lucky. It would have required pulling 10K+ of records into memory just to initialize the grid’s paging system. And then doing it repeatedly. Waaayyyy to slow. Once we knew exactly how many records were going to be paged through, then we could start the lazily paging. But if something changed (say we wanted to filter the data somehow) all bets were off. Everything got invalidated and the whole process started over from the beginning.

We didn’t event try to data sorting. Might be easy, might be hard. idk.

Reading Matti’s article, you start to see the complexities. I understand now why things are they way the are.

If the problem really is me not being smart enough to understand how Lazily Loading works, someone please chime in. I admit my ignorance when I am at fault.

If no mistake have you made, yet losing you are … a different game you should play.

Thank you everyone,

  • Jon

Hi,

my 2 cents:

Sorting: Lazy loading has no benefits regarding sorting. To have it sorted, all items must be fetched and that’s no lazy operation. Ok, if there is a database and that database does the sorting for you, then it could work. But I dislike the idea to throw paginated sort queries at the database. Who says the database is smart enough to cache the previous result set internally instead of fetching and sorting every time.

Did you see https://vaadin.com/blog/data-binding-to-grid-gets-easier-and-more-efficient
Starting from Vaadin 17 the count is optional. I don’t know if that gets back-ported to Vaadin 14 (I think I read it somewhere) but maybe you can switch to 17?

Regards
Christian

Thanks Christian,
Ni, I hadn’t seen this blog about changes in v17. It’s funny, I was thinking about branching and starting a component with this feature.

This will be super handy although updating to v17 is probably a ways out.

Thanks again,
Jon

Hi Jon,

In our app we do have different queries for count vs. fetch whenever possible. For example, if the search criteria can be evaluted without joins, our persistence layer will count without joining tables, without sort etc. This approach does result in native queries, but it allows us to speedup count requests.

We are able to let the database do all the filtering, so nothing gets loaded in the app during count.

As Christian pointed out, V17 supposedly will have improvements to avoid the count, but I haven’t looked into it yet.