How to improve the performance of your web application

Improving a web application’s performance becomes more important as the number of concurrent users the application has. There are several reasons to enhance the performance of your web application, even if you do not have a huge amount of concurrent users. Here are a few of the benefits of doing this:

Improve the end-user experience by reducing response times.
Make your application more future proof.
Save hosting costs.
Reduce your application’s CO2 footprint.

This topic itself is so broad that you could write a book on it. In this blog post, we take a brief look at a couple of common approaches to performance enhancement. First, we discuss load balancing, then a few different caching approaches, and finally we provide a few examples of other performance-optimization techniques.

Using load balancing

The purpose of load balancing is to distribute the load between multiple application server instances. Load balancing can be done on a hardware and software level, and there are several different load balancing strategies. For Vaadin, we usually recommend using load balancing with sticky sessions. The reason is that session replication is generally not easy to apply to Vaadin applications due to the server-side session architecture. For more information on why you generally should use sticky sessions, please refer to this blog post about load balancing.

Using network cache

Another technique that is widely used to improve latency and throughput of an application is caching. A cache is a temporary data storage where information that has already been fetched can be persisted for faster access in the future. The best candidates for caching are static resources, or those that change infrequently, such as pictures, JavaScript (JS) files, and Cascading Style Sheets (CSS) stylesheets. Another candidate for caching is operations that are expensive to calculate.

Between client and server, various caches are available; from browser and application caches to content delivery networks (CDNs) and reverse proxies in front of an application server. Caches can be divided into two groups based on an allowed access level: shared (available to multiple users) and private (available only to single users).

HTTP caching

To specify which resources are cacheable and for how long, developers define values for HTTP cache headers. A cache-control directive dictates how content can be cached. Available options for this rule are public, private, no-cache, and no-store. Without this header set, all other applied caching rules have no effect.

A header with public permission allows a resource to be stored at any caching level. Consequently, this is not a viable option for sensitive content, as access should be limited. Based on this, either the private value, which dictates that storage is allowed only in a private user’s cache, or the no-store value, which prohibits caching completely, should be used.

To specify the amount of time needed for a previously downloaded resource to be considered alive (not stale and still applicable for re-use), the max-age value is used. After this value has expired, a resource has to be fetched again.

A cache miss occurs when trying to read data that is not present from a cache. In this case, the request is forwarded to a higher cache in a chain (closer or to an application server).

Reverse proxy server caching

Reverse proxy cache is a cache server that is in front of the application server. All requests are passed through the proxy server, but the user (in the browser) is unaware of the proxy’s existence. Reverse proxies are commonly used for caching and also to improve the security (e.g. to filter out malicious requests) and reliability of the application.

When a web application in the browser requests a resource, such as a CSS file or an image, from the web server, the reverse proxy first checks if the resource is already in the cache and either returns it directly or requests it from the server. This takes the load off the main application server and enables faster resource loading, making the web application more snappy. There are various reverse proxy cache implementations available, such as NGINX, Apache, and Varnish.

Using database cache

It is not uncommon for the worst bottlenecks in your application to be in the database. Performance problems caused by the database usually manifest when going to production, especially if the database is no longer on the same physical server as the application. There are several things that might make your database integration slow. These are a few common ones:

Querying too much data at the same time, especially when the resultant entity has a lot of fields.
Doing queries in a loop or too frequently.
A poorly indexed database or poorly constructed slow queries.

Also, the bigger the latency and slower the connection between the database server and application, the bigger the effect these will have on the scalability of your application. In addition to cache solutions, there are other remedies available for these cases. For example if you need only part of the data available for the queried entity, you can use Projections, Entity Graphs, or custom JPA queries to fetch only essential data.

There are several different caching strategies available, such as cache-aside, read-through, write-through, and write-back. When the application is doing more reading than writing, it is our experience that one should usually use either cache aside or read-through strategies. As a low-hanging fruit cache example, you could use Google Guava Loading Cache to cache frequently used queries, which are not updated that often.

Optimizing the application

In addition to having a cache strategy, it usually makes sense to investigate performance bottlenecks in the application code too. We usually use profiling tools, such as JProfiler, XRebel or VisualVM for this purpose. If the problem only appears in your restricted production environment where you are not allowed to connect profiling tools, for example, one common solution is to increase the logging with timestamps in the places where bottlenecks are suspected. Common performance bottlenecks that we have encountered when optimizing Vaadin applications are described in the sections below.

Storing less data in the session

Since Vaadin Flow applications run with a server-side session containing the state of the application, it is easy to store all the data there. This is usually not a problem until you go into production and the amount of concurrent users increases significantly. In one case we worked on, uploaded documents were stored in user’s sessions and the memory usage exploded when the app was brought to production. To prevent this, you should clear uploaded data from the memory once it is stored in a file or database. Another way is to use in-memory DataProviders, for e.g., to fetch all ComboBox or Grid values from the database and put them in the ListDataProvider. To save memory you should use lazy loading (backend) DataProviders and if necessary caches to speed up global loading.

We recommend measuring the session size of the application, already in the development phase and especially before going to the production. We usually follow the approach detailed below. This approach assumes that you have a scalability test available and that you can attach, e.g. the Java VisualVM to the application’s process.

Run the scalability test with e.g. 10 users to “warm up” the application.
In the profiler: garbage collect to minimum and take note of the heap size a.
Run the scalability test with e.g. 200 users.
In the profiler: garbage collect to minimum and take note of the heap size b.
Estimated size of a session s = (b-a)/200
Repeat steps 3 - 5 with e.g. 500 users and verify the s.

Avoiding running out of memory

This is related to the previous section. There could be multiple reasons for the application to run out of memory. First, the memory (heap) reserved for the application might be too low for the amount of concurrent users used. In this case, it is possible that the application has a memory leak which will eventually result in it running out of memory. Also, it is possible that there are just too many concurrent users in the system.

If enough memory (Java Heap) is not reserved, the application might end up triggering garbage collection frequently, which in the worst case freezes the application for several seconds. You can use following formula to estimate the minimum amount of memory needed:

MemoryNeeded = IdleMem + (SessionLiveTime / 60min) * HourlyUsers * SessionSize

Where IdleMem is the memory usage when there are no users in the application, SessionLiveTime is the session timeout in minutes (usually 30min) + the approximate time a user spends using the app, HourlyUsers is the approximate number of unique visitors in one hour and SessionSize is the size of the session estimate (see the previous section). This is obviously only an estimate and you should still add, e.g., an extra 25% to it.

The need to restart the server after e.g. a few days because the application has run out of memory, is a common symptom of a memory leak. To find out where the leak is, we recommend using an advanced profiler, such as JProfiler, which allows you to find garbage collect roots from the objects in the memory.

If a single server needs tens of gigabytes of memory to support all your users, then it probably makes sense to use clustering. Vaadin applications in production usually use 8 - 16GB of memory per server instance.

Using native SQL instead of running queries in a loop

Sometimes, to make implementation easier it might be tempting to avoid complex native database queries and implement the logic in Java instead. This might lead to a situation where the results of a primary query are looped through and a secondary query is executed for each iteration. This is usually a bad idea and will lead to performance problems in production, especially when the database server is on another physical server.

Summary

We have discussed a few common performance optimization techniques for your web application. Starting from load balancing, different caching strategies, and other optimization techniques. It is obvious that this only scratches the surface and many of the subjects could be a whole book.