Timeline questions - data sampling

Neil · January 30, 2013, 8:26pm

I am using the Timeline addon (with Vaadin 6.x). I’m being asked some questions by my users that I am currently unable to answer - I suspect these need to be directed to the author of the Timeline addon - but I did not see an easy way to do that (so that is my first question - is there a standard way to get in touch with specific authors - I don’t want to waste everyone’s time reading posts like this if it isn’t necessary… Thanks…)

Specifically, I need to know the following:

How does the timeline object do its data sampling (downsizing)? If I load 150k data points and my screen resolution is 1280x900, how many data points will the timeline try to display? Or does it matter what the display resolution is? My program actually allows me to place 1, 2, 3 or 4 different Timeline objects on a row in my browser window. I’m assuming the data sampling will be different if I have one graph spanning the screen as opposed to 4 graphs squeezed in side by side - is that correct?
Someone here tried loading 4 graphs with 90 days of data from one of my systems into a single Timeline object (~128k data points for each line in the chart) and I was told that the system spiked to 100% CPU utilization. Not sure what caused the spike - trying to figure out if it is the Timeline object or something else - is the data sampling/downsizing a CPU intensive process?
Is there an easy way for me to figure out how much memory it takes to graph 100 data points, 1000 data points, 1 million data points? I need to be able to size my Tomcat system so I can tell people they can graph x lines of data with y datapoints in each line before we blow Tomcat out of the water…

Thanks very much,

nbc

Jonatan · January 31, 2013, 7:37am

Hi Neil,

I’m not the original author of the add-on, but I’m one of the guys in charge of maintaining it.

The downsizing is currently done using a naïve method:
N = amount of data points in the container
P = width of the timeline component in pixels
Then we calculate the distance between visible points in the container as
D = N / P
When drawing, we read every Dth data point from the container and send to the client-side Timeline widget. This sometimes causes weird artifacts as can be seen in
this ticket
, which also proposes a solution.

If you have four graphs side-by-side, each of them will calculate the value for D as above and sample the values according to this.

This can cause some issues depending on the container implementation. E.g. you might run into performance issues if your container always loads 1 000 000 rows from a database when it only would have needed, say every 1000th row for a grand total of 1000 rows.

I’m assuming that the 100% spike was on the client computer. Unfortunately the rendering of Timelines is quite CPU intensive due to the way it was implemented. Of course it depends on the amount of Timeline components rendered at a time and the size of these. Larger sizes mean more CPU time. The sampling should not affect this, as it is all done on the server and shouldn’t be that resource intensive (depending on the underlying container implementation, as I explained above).
Sure, I’d go for running a memory profiler on the Tomcat process. This should plot memory usage over time and you can then compare the effects of different data sizes. Personally I like to use Oracle VisualVM, which is bundled with your JDK. If you’re not familiar with this tool, google knows of good guides for using it. As a bonus, you get the CPU usage plotted over time as well, so you can also inspect the impact of larger data sets on CPU time.

HTH,
/Jonatan

John · January 31, 2013, 8:00am

Hi Neal,

When it comes to the commercial Vaadin addons the best place to get an answer quickly is using the Pro services and filing a Support Request. That guaranties that someone will get you the answer you need and track the original author down if needed.

That said, the second best place to ask is right here in the forums. At least most of the addon authors I know read the forums and will answer if they can and have time. The forum is a community effort so it may take some time to get an answer and the answers may vary in quality but you usually do get an answer.

When it comes to non-commercial addons then the addon page usually states some author web page, forum post or email address where you can ask questions. This is of course totally up to the author to provide and not all authors want to get contacted. In that case leaving an addon review might work.

But let me try to answer your questions.

The Timeline uses the pixel width of the component to determine the maximum amount of points to load for a graph. For instance if the Timeline is 400px wide then a maximum of 400 points will be loaded to the browser. If you have two graphs visible then a maximum of 400*2 points will be loaded etc. The Timeline also caches these points so if you again look at the same time interval with the same zoom level then the Timeline will not load anything and just show the cached points.

The way Timeline decides which points should be shown (and loaded from the container datasource) when subsampling occurs is by a pretty simple algorithm. For instance if the graph is showing the whole time range of what you have in the data source (say 1M points) then the Timeline will calculate a ratio D = amount of points in container / pixel width and then get every Dth point from the container and send that to the browser. This is not optimal in that data peaks might get lost if the peaks happen between each Dth point. You will notice this if you zoom in/out and suddenly peaks become visible when you zoom in and then again hidden when zooming out. There is an open ticket
#9116
about improving this. However, the timeline does never load all your points whatever amount of points you have.

There are two reasons I can thing of which might cause your issues.

a) Fetching the data from the data source. The Timeline will quite intensively query the container for the required data and if the container is connected directly to a database (using a JPAContainer or SQLContainer) the database<->server communication might slow down the loading.

b) Rendering is a CPU intensive operation. This is most likely the cause of the 100% CPU spikes.

First of all if you are using IE6-IE8, which does not support the HTML5 canvas, VML will be used instead which is VERY slow and CPU intensive. Depending on how many of those 128k points actually got drawn I do see that things might get slow with many graphs. VML is just an old technology which is slow and there is not much that can be done about that.

Even on the most recent browsers drawing on the HTML5 Canvas is a CPU intensive process. On my workstation I rarely see something like 100% of CPU but I have seen this on some older laptops where older versions of Firefox have been used. I also remember that on some operating systems hardware rendering of the HTML5 Canvas was not supported with older browsers, I think that is the case with IE9 still (not sure) and Firefox on linux also suffered from this some time ago.

There are a lot of optimizations the Timeline could do on this front which has not been done yet. Some of them include double buffering of the rendering, using animation frames and a lot of other “tricks” currently used by many HTML5 games. However these cannot be all done across all the supported browsers so they were left undone for now. Maybe some day someone will look into that.

Some common optimization tips I could give is limiting the amount of simultaneous graphs in the Timeline, disabling graph shadows and use as little as possible alpha colors. These are the usual culprits.

The Timeline does not store the points in the Timeline object, it always fetches them from the container data source. So a good place to focus is on how big your containers grow. If you are loading everything into say a BeanItemContainer from the database then it might become quite big when talking about millions of points. If you are using some lazy loading container like the SQLContainer or JPAContainer then it will not consume as much memory. Since each graph maps to one container you should be able to deduce the memory limits by looking at the container sizes quite easily.

Edit: Jonatan apparently beat me to it

Neil · January 31, 2013, 2:36pm

Thanks for the info guys - I really appreciate it. I think I understand better how the object works. I’ll try to arrange for some monitoring to look at our memory usage. I don’t think most people are using IE, but I will verify that as well… I still owe you a couple of screen shots for my other posting about the difference between the legend value at the top of the graph and the actual data point showing on the screen - I have not forgotten about that but I have not had time to get the data for you…

Much obliged,

nbc