A minimal zero-downtime deployment using nginx & Spring Boot

Starting Java web applications is not an instantaneous process. It is not rare to see tens of seconds for some applications to boot. Although you can get an almost instant startup time if you leave out Hibernate and run Quarkus on GraalVM, that’s not the reality we live in.

zero-downtime deployment-nginx-springboot

During development, tools like JRebel and Spring Boot Devtools help reduce the development cycle, but how do you tackle this issue on a live website powered by your Spring Boot application? How can you deploy great new features or critical bug fixes without causing nasty interruptions for your end users?

Traditionally, zero-downtime deployment has been achieved using clustering and high-availability setup. These are expensive to set up and maintain, especially if you haven’t yet moved over to tools like Kubernetes. For smaller-scale deployments, traditional Tomcat has provided a great Parallel deployment feature that gracefully moves your users to the new version when their sessions expire.

The project that uncovered a new approach

Over the holidays, I was working on a hobby project for my local sports club. This approach may have been just enough, but deploying old-school WAR files to Tomcat when using Spring Boot in 2022 didn’t feel quite right, especially as I have been a fan of hosting Spring Boot apps as handy Unix services for years. I also wanted to save some pennies for my sports club, so I didn’t want to go for full blown Kubernetes-based cloud deployment either. But I still wanted a decent UX for end users and the freedom to deploy updates anytime I want.

Here’s how I did it. Feel free to reach out to me on our Discord if you have any questions.

Application architecture and requirements

My Spring Boot application essentially provides a CMS system for our sport clubs website, as well as accommodating events and blog posts in various categories. I modernised a Perl app that was more than two decades old and used flat files for everything, into a “slightly” more modern system based on Spring Boot.

The data is now stored in an SQL database using Spring Data JPA. Dynamic website fragments are generated using traditional Spring MVC and Thymeleaf templates. And our active club members are using a Vaadin-based administration UI to edit the content, without the need for a separate webmaster.

website-architecture

Writing the first version of the application was mostly a nice experience. I will admit I lost my nerve a couple of times while fiddling with the Thymeleaf templates. For a while, I considered also rendering those parts as Vaadin components, but this approach is a bit safer for web crawler indexing. Otherwise, I have to admit that a web developer’s life is much better today than it was in the last millennium.

Deploying the first iteration was almost trivial. I packaged the Spring Boot application as a systemd service to the existing virtual server and modified the nginx configuration to proxy the dynamic parts to it. Fully static content and some old PHP scripts work as before, and users never notice that a portion of the site is now powered by Spring Boot. The beauty of this approach is that I don’t need to change everything at once while gradually modernizing the website using Spring Boot and Vaadin.

Update challenges without zero-downtime updates

The problems began after the initial launch. Of course, there were a bunch of bugs and missing features. Every time I deployed an update, the front page of our website showed a “503 Service Unavailable” for a while, when the Spring Boot server was restarting and Hibernate was doing its magic. In the worst-case scenario, somebody doing updates with the admin UI could lose their work while I’m fixing a bug in a Thymeleaf template.

My initial thought was to beta test our upcoming Clustering tooling. This new tooling will help Vaadin users implement professional Kubernetes-based hosting that supports zero-downtime updates and fluent session migrations for end users. But that would have required full renewal of the hosting setup, and I decided to try something more lightweight for this use case.

My absolute minimum requirements:

Absolutely no interruptions for website visitors during an update
Admin users must be able to save their work before the system rolls over into the new version
No Kubernetes setup or hosting from cloud to keep the system simple
Keep the resource usage down (a cheap virtual server should be able to handle it all)
Ability to roll back in case I accidentally ship something catastrophic (we’ve all been there).

The recipe for trivial zero-downtime deployment

After considering a handful of possibilities, I came up with the following modified blue-green deployment setup:

I have two systemd services configured for the application, for different ports. Normally I have just one running, but during the updates, I have both the old and new application running for a while. I call them blue (port 8080) and green (8081), even though this is not academically a blue-green deployment.
Once the new version has successfully started, I notify the admin UI users about the upcoming restart of the application and give them a bit of time to save their work.
Change nginx (working as a front proxy) to point to the new version.
Notify the admin users that they are now on the new version and can continue their work.
Shut down the old version to free resources and port for the next (++) version.

Prepare the systemd services

This part is all done by the book. My server is running Ubuntu, so I used a systemd-style script. The only special thing is that I have two start service declarations, and the second one overrides the port to be 8081 instead of the Spring Boot default, 8080. Here is my /etc/systemd/system/pr-web-green.service file with the custom port configuration:

[Unit]
Description=PR website helpers (green)
After=syslog.target

[Service]
User=paimionrasti
ExecStart=/home/paimionrasti/pr-web/pr-web.jar --server.port=8081 
SuccessExitStatus=143 

[Install] 
WantedBy=multi-user.target

The other one is similar, but without the port declaration.

Make sure you also configure your Spring Boot build to generate “executable jar files”. For my Maven build, the relevant configuration is here:


   <groupId>org.springframework.boot</groupId>

   <artifactId>spring-boot-maven-plugin</artifactId>

   <configuration>

       <executable>true</executable>

   </configuration>

</plugin>

Prepare configuration for nginx

In my nginx site configuration (/etc/nginx/sites-available/paimionrasti.fi.conf), I already had a configuration like this:

location /dyn/ {
  proxy_pass http://localhost:8080/dyn/;
}

This is fine for the blue service, but we need another for green. To switch between the current and the next version, I duplicated the site configuration into /etc/nginx/sites-available/paimionrasti.fi-green.conf and changed the port in the proxy_pass configuration to 8081.

Now all I need to do to switch between the deployment is to relink the active one from the sites-available directory to sites-enabled and tell nginx to reload the configuration.

Notifications for active users

As I’m using Vaadin for the admin part, doing a notification system for active users is pretty much as simple as it can get. As with all my Vaadin apps, I use a WebSocket-based communication channel, which enables me to push UI changes for all users in real time. To enable that, add an @Push annotation to your AppShellConfigurator class, typically Application in Spring Boot apps.

Next we need to collect the active UIs. I’m doing that directly in my UserService class by making it implement UIInitListener. Vaadin’s Spring Boot integration then automatically notifies that class about new UIs and I can keep track of them (and clean up when people leave). Here is the relevant code snippet:

Instant startupTime = Instant.now();

private ConcurrentMap<UI, User> activeUIs = new ConcurrentHashMap<>();

@Override

public void uiInit(UIInitEvent event) {

   authenticatedUser.get().ifPresent(user -> {

       final UI ui = event.getUI();

       activeUIs.put(ui, user);

       if(Instant.now().toEpochMilli() - startupTime.toEpochMilli() < 70000) {

           ui.access(()-> {

               Notification.show("Now running on a new version!");

           });

       }

       ui.addDetachListener(e-> {

           activeUIs.remove(ui);

       });

   });

}

I’m only interested in logged-in users and I use the authentication information for other things as well, so the registration code could be much simpler for your case, but you should get the point from this. You could use just a set for active UIs and skip the check about authentication. In the code snippet, I also notify UIs that are created just after the server starts, who are most likely active users waiting for the new version to come up.

In the same service class, I also provide a method I can use from my webhook to notify all users about the upcoming restart and to report the number of active users (I don’t want the grace period if there happen to be no active users).

public String notifyActiveUsersAboutDowntime() {
  tellAllActiveUIs("System is going down for maintenance in about 1 minute! Save your work and wait for the application to reload");
  return "" + activeUIs.size();
}

public void tellAllActiveUIs(String s) {
  activeUIs.forEach((ui, user) - > {
    Notification notification = new Notification();
    notification.setPosition(Notification.Position.MIDDLE);
    notification.setDuration(2 * 60000);
    notification.add(
      new Paragraph(s),
      new Button("Close", e - > {
        notification.close();
      }));

      ui.access(() - > {
        notification.open();
      });
  });
}

The final missing piece of the notification system is a webhook I can call from my script that orchestrates the version upgrade. This is easily implemented using a Spring REST controller:


public class AdminController {

   @Autowired

   private UserService userService;

   @GetMapping("/sheduledowntime")

   @PreAuthorize("hasIpAddress('127.0.0.1')")

   String sheduledowntime() {

       return userService.notifyActiveUsersAboutDowntime();

   }

}

I only need to call that API via a local script, so I close the access to all but localhost-originated requests using the @PreAuthorize rule (Spring Security is already in use for authentication and authorization for the Vaadin UI part).

Script to orchestrate the version upgrade

All the preparations are now done. I could now manually do all the tiny specs required to update a new version to production. But I really want to automate it. Below, you can see the script I execute on the server once the new JAR is uploaded to the server. To automate this and the execution of the update script, I have another trivial shell script on my local machine, where I cut new releases. I upload the JAR file using scp and execute the update script using ssh. If I ever get another volunteer to maintain our club’s website, I’ll set up a CI system that does the same.

blue-green-deploy.sh

#!/bin/bash

blue=pr-web.service
green=pr-web-green.service
nginxtarget=/etc/nginx/sites-enabled/paimionrasti.fi.conf

# if 8080 (blue) doesn't respond, we'll deploy there, otherwise we expect green (8081) next
/usr/bin/nc -z localhost 8080

if [[ $? -gt 0 ]]

then
  echo "Starting blue (8080) stopping green (8081)"
  current=$green
  next=$blue
  nginxsource=/etc/nginx/sites-available/paimionrasti.fi.conf
  port=8080
  prevport=8081
else
  echo "Starting green (8081) stopping blue (8080)"
  current=$blue
  next=$green
  nginxsource=/etc/nginx/sites-available/paimionrasti.fi-green.conf
  port=8081
  prevport=8080
fi

#start the next service
systemctl start $next

# wait until Spring Boot (eh, Hibernate) starts 
# & give a minute for users to prepare for downtime
echo "Waiting for new SB process to start"

i=0

while [[ $i -lt 62 ]]

do
  echo -en "\rWaiting for new SB process to start $i "
  ((i++))

  if [[ $i -gt 60 ]]; then
    echo "The new process did not start properly"
    exit 1;
  fi

  /usr/bin/nc -z localhost $port

  if [[ $? -eq 0 ]]

  then
    echo "ready";
    break;
  fi

  sleep 1
done

# Notify current users of upcoming maintenance
openwindows=$(curl -s "http://localhost:$prevport/dyn/sheduledowntime")

echo "Active users: $openwindows"

# Give a bit of time potential users to
if [[ $openwindows -ne "0" ]]; then
  echo "Giving 60 secs for active users to prepare for downtime..."
  sleep 60
fi

# switch the front proxy (nginx here) to use the next service
ln -fs $nginxsource $nginxtarget

systemctl reload nginx

#shut down the obsoleted spring boot process
systemctl stop $current

# set defaults for reboot
systemctl disable $current
systemctl enable $next

echo "Version upgrade complete!"

Lessons learned and improvement tips for larger-scale deployments

For this simple use case, I’m pretty happy with how the system is functioning in its current state. Going forward, I will need to carefully plan database changes. Apart from that, I can deploy new versions pretty much whenever I want. Ideally, the engineer in me would like to improve this setup further. There are numerous things one could do to improve this approach, each with different pros and cons:

Provide users with the ability to notify when they are ready for an upgrade. This would allow the system to update only when all users are ready, but to do it right away when they are.
Make the script more interactive. For example, allow smoke testing on the new production version before rolling it out to users.
Run the application on multiple servers. Instead of using a simple proxy_pass configuration in nginx, one could move over to upstream configuration and make nginx work as an actual load balancer. Or switch over to HAProxy, which has more flexibility in the OSS version. Set up sticky sessions and guide new sessions to new versions (otherwise we’d need a perfect session replication and/or REST API versioning, which would be another pain in the butt).
Move the whole deployment setup to be container based and orchestrated by a Kubernetes cluster, and maybe moving to a cloud provider.
Build an admin UI to monitor the number of users on different versions and guide the preferred amount of traffic to different versions.

I hope my recipe helps those of you who are stuck on legacy hosting setup. And it would be great to hear how you have solved any similar problems! Please reach out to me in our Discord or on Twitter