A Grid DataProvider that fetches in backend but retains the ability to sort in memory

I was wondering if anyone had some ideas on how to solve a problem.

I wrote a DataProvider that fetches from a Cayenne-based backend. It extends AbstractBackEndDataProvider and implements BackEndDataProvider and ConfigurableFilterDataProvider.

It runs great but I’m bashing my head in trying to get it to have the ability to sort both in backend and in memory.

The backend is easy, but the in memory one I simply can’t seem to fix because the results are paginated.

My first attempt was to sort at the end of the implementation for:

public Stream<T> fetch(Query<T, Q> query)

Any request for a backended sorted content would pass the sortOrders to Cayenne and return an already sorted stream.

Instead, requests for in-memory sorted content would void the backend sort and perform a comparator based sort operation after getting the results back from Cayenne.

I seemed like it was working until I tried sorting in-memory with a list of more than 50 lines. The problem is pagination.

The backend data provider needs to get results paginated so fetch get’s called multiple times. Sorting there means you only sort the batch of 50 and not the full stream.

Backtracking from the fetch I found out that it is the Grid DataComunicator that is responsible for the call. So I wrote a very convoluted subclass of Grid that overrides constructors to set its own DataComunicator that allows me to intercept the paginated stream after it has been joined.

This is the overriden constructor:

	public static class CayenneDataCommunicatorBuilder<T, U extends ArrayUpdater>
 			extends DataCommunicatorBuilder<T,U>
 			implements Serializable {
 		protected DataCommunicator<T> build(Element element,
 		                                    CompositeDataGenerator<T> dataGenerator, U arrayUpdater,
 		                                    SerializableSupplier<ValueProvider<T, String>> uniqueKeyProviderSupplier) {
 			return new DataCommunicator<>(
 					dataGenerator, arrayUpdater, data -> element
 					.callJsFunction("$connector.updateFlatData", data),
 					element.getNode()) {
 
 				@Override
 				protected Stream<T> fetchFromProvider(int offset, int limit) {
 					return super.fetchFromProvider(offset, limit);
                                     // <---- SORT HERE
 				}
 			};
 		}
 	}
 
 	public CayenneGrid(Class<T> beanType) {
 		super(DEFAULT_PAGE_NUMBER, null,new CayenneDataCommunicatorBuilder<>());
 		configureBeanType(beanType, false);
 	}

Now this works albeit I honestly don’t think I should be catching the stream in the DataComunicator. The sorting should be in the DataProvider.

Then I tried it with a TreeGrid and I hit a wall because TreeGrid does not expose any constructor that allows me to pass a custom DataComunicator.

I just hope I’m doing something incredibly stupid and there’s a way simpler way of doing this that I haven’t seen.

Should I just disable pagination on a case-by-case basis so that when I fetch with in-memory sorting I get all the results in one go?

Any ideas?

You might consider implementing an AbstractBackEndDataProvider and sorting the data in the fetchFromBackEnd() method. I have never had to customize the DataCommunicator itself. AbstractBackEndDataProvider has support for sorting the data.

protected Stream<Entity> fetchFromBackEnd( Query<Entity, EntityDataFilter> query ) {
	List<Entity> items = dataService.fetch( query );
	Stream<Entity> stream = items.stream();
	if ( query.getSortOrders().size() > 0 ) {
		stream = stream.sorted( GridHelper.createSorter( query.getSortOrders(), fieldComparator ) );
	}
	return stream;
}

Hi Ken,

My implementation does exactly that. But fetch and fetchFromBackEnd get called multiple times when a paged request is being run.

Let’s say I have 160 records in a table. With a standard page of 50 this means my code will be called 4 times. Once with offset 0, then 50, 100 and 150.

If I sort in fetchFromBackEnd (or fetch), I only sort those 50 records running through the code at the time and not the whole batch of 160.

The result is a grid showing the first 50 records sorted alphabetically, then the next 50 records starting again from A and ending with Z. Then again another batch of 50 and finally the last 10 records, also sorted alphabetically.

I would say this is not solvabled. As you wrote in your last comment you will have more entries in the backend then in memory.

I don’t understand why you want to mix where the sort happens.

In that case you have no choice but to sort where you have the entire batch loaded from the DB, and that is likely on the back end. Its a common problem of efficient loading from the DB vs. less traffic to the UI, so your grid page size is smaller than your DB batch size. You could pass sort criteria in the request from the UI and then the BE can sort the entire batch and then let you scroll that batch from the grid.

Well, I can think of a few cases.

Let’s say I have a table with some generic stuff like firstName, lastName, dateOfBirth and some other individual related details.

Sorting by those in the backend is intuitive and efficient. But let’s say my java bean adds some simple and not so simple derived calculations like using the data to calculate a fiscal or tax code or that the bean has a to many relationship with some other entity of sorts like ownedPets and we want to just show the count of those pets in the grid.

I’d like to have a simple and fast way to put up a grid with one single DataProvider that can sort both backend and in-memory so that my table can just show firstName, lastName, dateOfBirth but also fiscalCode, ownedPets.

When you click on a column to sort a property that is in the backend, the setup runs the sort by passing the ORM the sorting part. If you click on a column without a direct match with the properties, the grid just goes back to in-memory and works flawlessly even though the logic behind it is totally different.

Let’s take for instance the simple case of showing a column with the number of petsOwned. It can most certainly be done on the backend as well with an inner join or by adding procedures or derived columns on the database. But we aren’t always given full power on the model side of things and sometimes we are sharing the model with other apps. Regardless, having to hard code a join for a column defeats the decoupling we had put in place by adopting an ORM like Cayenne in the first place.

And all of this would be simply solved if the DataProvider interface had an additional method so that the DataComunicator could run through it the final assembled batch of pages. It makes sense that the DataProvider, being in charge of the data, would have the final say on what it reports. Passing the reference for the stream with a default implementation that just returns it right away would have zero impact performance wise.

But all those cases don’t make sense when you have a lot of data. Then you must use the database for sorting.

You have two options:

  1. Store the calculated values in the database
  2. Implement the calculation in a database function

A lot of data means? Hundreds of thousands, sure.

But 200? Or 100? With default paging even just 60 records break the sorting and sorting 60 records in memory happens in a blink. If the same database had some tables with a lot of data and other with just very little (a very common scenario) it would be useful to have a repeatable flexible setup instead of having to use two different DataProviders.

Plus there’s the middle case of tables with a medium amount of data where sorting in-memory is still acceptable with a slight performance hit. In those cases I could open the gird sorted by lastName and have that sort efficient in the backend and only if the user really needs to sort by fiscalCode, rely on the more impacting in-memory. Best of both worlds.

If you only have a few hundred, you can load them all and sort in memory.

You don’t need to implement the DataProvider as in your example.

The Grid has the setItems method that is overloaded.

For a ListDataProvider it’s just

grid.setItems(list);

For the BackendDateProvider there are two options one with count and one without:

grid.setItems(query -> <fetch from backend>);

grid.setItems(query -> <fetch from backend>, query -> <count from backend>); 

That’s it.

Sure but the items are not all just sitting there. There’s a lot of glue code that I can skip with a nice data provider that is already setup to talk to Cayenne.

For instance, my code to create a grid with a list of composers from the romantic era looks like:

new CayenneGrid<>(Composer.dataProvider());
grid.addPropertyColumn(Composer.NAME).setSortable(true);
grid.addPropertyColumn(Composer.COMPOSITIONS)
	.setComparator(Comparator.comparingInt(c -> c.getCompositions().size()));

This includes the name column sorted in the backend and the number of compositions column sorted in-memory. Sorting by compositions is unlikely useful and probably rarely used, but there for conveniency. The default would present a fast grid that is sorting in the backend and does not hit performance unless really needed.

The CayenneDataProvider took care of setting up the context, the configuration any fixed filters a query cache provider to avoid fetching the same data multiple times, plus it operates on properties auto-generated from beans with compile time checking for typos ecc. It’s just nice not to have to repeat the setup to manually grab the items. Don’t get me wrong, it can obviously be done with the setItems. I just like the idea of having one DataProvider-based scenario that can deal with twenty lines and a million in the same way.

What if you move the code to the repository?

Then you’ll have methods that you can use from either setItems method.