Vaadin Timeline - scaling?

We are using RRD to collect monitoring data from thousands of devices around our network. Bandwidth data is collected every minute and is stored for 13 months (~575k data points). I am attempting to write an application that will let users select a subset of the data files and graph them. They may look at the last 4 days of data or the entire year’s worth on a graph. I am using rrdTool to generate the graph files on the fly, and I’m starting to work on some Vaadin extensions to be able to manipulate the resulting images. But I see that Vaadin has a Timeline object which might let me create the graphs and then scroll/zoom through the data. My question is - how well does it scale? I would need to be able to create multiple graphs on a page (I assume that each graph would be a different Timeline object), and each graph might contain up to a dozen or more lines, where each line (dataset) could have up to 575k data points. That’s a lot of data to work with. We’ve tried a number of graphing packages and they all work really well until we start feeding them a million data points, and that’s were things start to break down.

Anyone have any experience with the Vaadin Timeline object and very large data sets? I’d be interested in hearing about performance or size limitations that you may have run into.

Thanks,

nbc

As a follow-on to this question, today I wrote a small program that builds a Timeline graph and it works very well… But when I try to load 200k or 300k data points, it can take more than 5 minutes and my Tomcat/Apache proxy seems to time out on me. There are 2 parts to loading data into the graph - one is pulling the RRD data from my web service and the other is creating the items in the indexed container that is used by the Timeline object. I think the bulk of the time is spent creating the items and storing the time and value property values (but I will have to verify that). Is this going to be a problem using this type of object? Large amounts of data taking a long time to load? Once the graph is created, it seems that the Timeline manipulates the graph on the screen very quickly - but 5+ minutes to load it is not going to fly…

So another question - are there other indexed containers that would perform better on a linear set of inserts and still work well with the Timeline object?

Thanks,

nbc

The amount of data points you mentioned should not be a problem for the Timeline. It will lazily load the data points which are displayed from the container so it should be quite fast (as you noted in your followup). But of course the more graphs you display simultaneously in the Timeline the more data has to be sent to the client and the slower the browsing of the timeline will be.

The bottleneck with the Timeline is usually the container and how fast it can load the data from the database. By using a lazy container you can optimize the initial loading time of the Timeline, but then again browsing the Timeline will be slower because of the real time fetching of the items in the container. Also, the Timeline fetches ranges of data points with a certain interval between the points depending how many points are being displayed, and as far as I know there are not any container implementations which would do this in an efficient manner yet, so even fetching the points lazily can be slow. Making such a container would probably speed the Timeline up considerably, but it’s not a trivial task. My hope is that some day this will get included in the Indexed interface which the Timeline could use.

If I understood correctly from your followup you are loading all the points in one blob from the webservice and then populating an IndexedContainer you then give to the Timeline? My guess is this will consumes a lot of memory and take time with that amount of points. Most likely the reason your Tomcat instance dies on you. One solution I can think of is to wrap the webservice into a container and lazily query the the webservice for the points so you do not have to do all the work at once.

One final tip, the browser part of the Timeline (the bottom graph) is heavy if the underlying container does not properly support loading data ranges with intervals. Since it shows the whole graph it will have to query the whole data set for points and if the container at that point starts to load all the points and not just the points queried it will take time. In this case you can disable the bottom browsing bar which should help. Of course the down side is navigating the Timeline will be harder.

John,

Thanks for the information. I’m not sure lazy loading will be possible for me, since users can select a timeframe they want to see - so if someone chooses 6 months or a year, I’ll have to somehow get that data into the graph right away. RRDTool generates a graph on a year’s worth of data in about 3 seconds. I don’t expect that rate of speed, but 5+ minutes just won’t do… There is also the issue of people loading a graph with 5 or 6 lines in it - and if each line has a year’s worth of data - I’ll retire before the graph gets displayed :slight_smile:

I broke down one set of data and did some measurements. I loaded 10 sets of 28000 (approx) data points from my web service, and then added those to the indexed container. The web service load time for each of the datasets was 3-4 seconds. However, the times for adding to the container grow dramatically. The data looks like this:

Container Load Times
CI[0]
Start: 08/04/2011 08:29:48 Stop : 08/04/2011 08:29:53 5 seconds
CI[1]
Start: 08/04/2011 08:29:58 Stop : 08/04/2011 08:30:10 12 seconds
CI[2]
Start: 08/04/2011 08:30:14 Stop : 08/04/2011 08:30:34 20 seconds
CI[3]
Start: 08/04/2011 08:30:38 Stop : 08/04/2011 08:31:08 30 seconds
CI[4]
Start: 08/04/2011 08:31:12 Stop : 08/04/2011 08:31:54 42 seconds
CI[5]
Start: 08/04/2011 08:31:58 Stop : 08/04/2011 08:32:47 49 seconds
CI[6]
Start: 08/04/2011 08:32:52 Stop : 08/04/2011 08:33:48 56 seconds
CI[7]
Start: 08/04/2011 08:33:51 Stop : 08/04/2011 08:34:58 67 seconds
CI[8]
Start: 08/04/2011 08:35:03 Stop : 08/04/2011 08:36:39 96 seconds
CI[9]
Start: 08/04/2011 08:36:42 Stop : 08/04/2011 08:38:25 103 seconds

It took 5 seconds to load the first 28000 data points into the container, but it took 103 seconds to load the final 27000 data points. I’m assuming the indexed container is doing some kind of linear search to insert the data - perhaps adding the data points in reverse would be faster?

I’d really like to be able to use the timeline object, but I’m not sure I can live with the performance as it stands now - I’ll do a bit more investigation… Is the indexed container my only choice or are there alternatives that might be faster that the timeline can process? I’m relatively new to the Vaadin environment, so I’m not sure where to be looking…

Thanks again,

nbc

Can’t tell you how disappointed I am at the moment with respect to the Timeline object. It has just about every single feature I want, except it takes forever to load the data (for large datasets - more than 20000 or 30000 data points - mine may run from as few as 1000 points to as many as 300k or 400k data points - per line with 5 or 10 lines possibly on a graph). I’m using an indexed container as the data source, and if anyone has any suggestions for how to speed up loading data I would be most grateful - either using that container or something other container to feed the Timeline.
I would really like to use it if I can…

I can provide the code I’m using to load the data, but I just realized I didn’t send a copy home and I can’t access it at the moment. But I’m basically using the same code that comes with the Timeline documentation - adding items one at a time to the indexed container… Let me know if you want to see the code in detail if that will help.

thanks,

nbc

Hi,

I’ve never used the Timeline, so I can’t offer any advice on that point - but I have used and created my own Containers, and it really isn’t that tricky.

I would place a significant amount of money that the time spent - over and above your webservices returning the data - is spent in actually building the container itself - i.e. it’s got nothing to do with the TimeLine component at all.

If you’ve followed the simple demo (I’ve just looked at the add-on page), it uses the IndexedContainer - which is ultimately wrapping a Hashtable. So, apart from anything else, all the manipulations on it are synchronized, and you’ll have all the cost of resizing the table.

If you can show the loading code, I’ll try to point you in a more efficient direction; in the meantime, I’ll see if I can knock together something helpful.

Cheers,

Charles

Hi Charles - you are absolutely correct - the time is spent loading the container - once that is done, manipulating the timeline itself is remarkably fast - I’m actually very impressed with the Timeline object - it provides almost all the capabilities I need in the graph object… The code to load the table comes right out of the documentation - I get an array of data from the web service and then add items into the indexed container - like this:

// Create the container:

Container.Indexed container = new IndexedContainer();
container.addContainerProperty(Timeline.PropertyId.TIMESTAMP, Date.class, null);
container.addContainerProperty(Timeline.PropertyId.VALUE, Double.class, 0.0);

invoke the web service - get RRDData rrdData - an array of data objects…

for(int i = 0; i < rrdData.length; i++){
rrd = rrdData[i]
;
item = container.addItem(rrd.getRDate());
item.getItemProperty(Timeline.PropertyId.TIMESTAMP).setValue(new Date(rrd.getRDate()));
item.getItemProperty(Timeline.PropertyId.VALUE).setValue(rrd.getRValue());
}

Nothing fancy - As I said, I’m relatively new to Vaadin - I have not had a chance to dig into the underlying code - In this case, the time to add records to the indexed container appears to go up (a lot) as the number of records increases. If you can show me how to speed this up, that would be great. I don’t mind the idea of implementing my own container if necessary - no idea how to go about it at the moment, but point me in the right direction and I’ll give it a try…

Thanks,

nbc

Hi,

In a nutshell : I’ve done some experiments, creating Container that implements Indexed (which is what I think you need for TimeLine).
IndexedContainer is slow : it’s really not intended for this scale of data.

I’ve written a little test program (please excuse the mess, it’s coding to be thrown away) and put it on GitHub for you to see

https://github.com/canthony/vaadin-container-investigations

It’s spookily close to your sample code you just posted (I started this a few hours ago) - I create a list of 525,600 RRDRecords representing 1 year (at 1 minute intervals) of updates. I then try and create three kinds of Containers

  1. A SimpleHashMapContainer - a very simple and naive implementation of Container & Container.Indexed, using a HashMap as opposed to a Hashtable.
    You’ll see that not all of the methods have been implemented - this is one of the slight downsides of Vaadin’s container model; alot of the methods are actually not necessary to implement! It’s difficult to know why ones to implement, and which not. I’ve done those that I think TimeLine is likely to use
  2. A BeanItemContainer
  3. An IndexedContainer.

As you can see from the results below, IndexedContainer is significantly slower, taking 27 minutes to load!

Created 525600 records in 283ms
SimpleHashMapContainer has 525600 items, taking 2352ms
BeanContainer has 525600 items, taking 3853ms
IndexedContainer has 525600 items, taking 1695102ms
(Each run was separate and on it’s own, I’ve just shown them together for clarity)

I recommend for simplicity, you try using BeanItemContainer, as it’s already in the codebase and you already have a RRDRecord bean! I think you’ll have to tell the TimeLine which bean properties to use

Something like the following should work :

  RRDRecord[] records = new RRDRecord[]
{};

BeanItemContainer<RRDRecord> container =
    new BeanItemContainer<RRDRecord>(
        RRDRecord.class,
        Arrays.asList(records)
    );
TimeLine timeline = new TimeLine();
timeline.setGraphTimestampPropertyId("rDate");
timeline.setGraphValuePropertyId("rValue");
timeline.addGraphDataSource(container); 

HTH,

Cheers,

Charles

Lifesaver !!! I’ll need to do some testing with this, but if I can make it work, this will be a tremendous help. Thanks very much - I will experiment with this today and let you know the results either later today or early next week. Much obliged,

nbc

Charles,

Using the BeanItemContainer reduces the load time of the container below the time it takes for the web service to provide the data - which is great. But now it seems that I don’t understand how Vaadin defines properties. I can load the container, but the Timeline is not getting the date/value properties set correctly. Originally, I was using the timeline defaults, but now I have to set the values from the container.

My data class (RawRRDData) looks like this:

public class RawRRDData implements java.io.Serializable {
private java.lang.String errMsg;

private long RTime;

private double RValue;



public long getRTime(){ return RTime; }
public double getRValue(){ return RValue; }
}

and the BeanItemContainer looks like:
BeanItemContainer container = new BeanItemContainer(RawRRDData.class, Arrays.asList(webservice data array));

So when I add the container to the timeline, it looks like:

Timeline timeline = new Timeline(“testing”);
timeline.addGraphDataSource(container);
timeline.setGraphTimestampProperty(container, <what goes here to define the timestamp property?>);

The timestamp property should be the RawRRDData.RTime field, but I have not been able to figure out how to define it.
Tried things like “RTime” and “RawRRDData.RTime” but the timeline throws an exception before drawing the graph.

So the question is a fairly general one - what constitutes a ‘Property’ in a Vaadin container, and how do I specify it in a context like this?

I believe that once I can get the Timeline to display my graph again, your data setup solution will probably be fast enough for my needs. Again, thanks very much for the help,

nbc

In Short - I suspect the BeanItemContainer is getting confused by the capitalisation of the property name.

I’ll quickly try something with your data class…

Thanks - Just FYI,

The field names RTime and RValue are generated automatically by the web service client - the class on the web service side uses rTime and rValue…

nbc

Ahh - I also see that your member variables/fields are “weirdly” capitalized too’; the BeanItemContainer is expecting objects that match the Java Bean specificatiom.

I also note that your timestamp is a long - isn’t the TimeLine expecting a Date property?

It might be easiest to create a wrapping bean for each RawRRDData, and use a collection of those for the BeanItemContainer.

OK - I’ve pushed some more code to the github repository. Look at CreateBeanItemContainer, and RRDBean - RawRRDData is the data as you receive it from the webservice.

I’m then turning into a bean object - (and making sure that the time attribute is a Date, which I think the TimeLine component is expecting)

As I said - BeanItemContainer wants JavaBeans - which, as you may know, are simply java classes with getters/setters following some rules to do with capitalization (actually, a javabean should also have a no-args constructor, and some other things - but they aren’t relevant here)

Hope that moves you a little further on.

Have a good weekend,

Charles.

Thought I would just pitch in with a couple of more things which might come up when using the BeanItemContainer with the Timeline, Charles pretty much covered everything else.

  • The timestamp property must be of type java.util.Date (or an extension of it)
  • The value property must be of type Float, Double or Integer (currently primitive types are not supported)

There is now also an example at
http://dev.vaadin.com/browser/addons/Timeline/tests/com/vaadin/addon/timeline/tests/BeanItemContainerWithTimelineTest.java?rev=20139
of coupling the Timeline with the BeanItemContainer.

That did it - I had to change the ‘rValue’ data type from ‘double’ to ‘Double’ as well - the Timeline complained that it was not a numeric value. So the class being passed to the BeanItemContainer now looks like:

class tlRRDData{
private Date rTime;
private Double rValue;

}

with the appropriate getters and setters, and the the Timeline takes “rTime” and “rValue” as the corresponding graph properties. And the speed is markedly improved.

Thanks for all the help - this should make life a bit easier in the next couple of weeks…

And for the record, if I sounded like I was bashing the Timeline object earlier, I really wasn’t - I was pretty sure right from the start that the problem was loading the container data - I just didn’t know where to turn to improve that part of the program… Thanks again Charles…

nbc

Excellent - glad it’s all working!

Cheers,

Charles.

Hi all, is this improvement introduced into Vaadin 7 or it has to be managed by the user?

br,

m.d.