Calling ChatGPT and OpenAI APIs in Spring Boot with Java

This article will guide you on integrating OpenAI APIs, such as ChatGPT, in your Spring Boot application. We will cover moderation, embedding, and chat completion requests. It is a part of our Building an AI chatbot in Java series.

Prerequisites

This tutorial assumes you already have a Spring Boot application. The example project we'll work on is a Spring Boot application with a React front-end created with Hilla. If you need to set up a new app, follow the instructions provided in the Hilla documentation.

To handle streaming responses, add the following dependency to the pom.xml file:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>

Obtaining an OpenAI API Key

To use OpenAI APIs, you need an API key. If you do not have one, you can create one on the OpenAI platform page. Save the key as an environment variable, OPENAI_API_KEY. Depending on your needs, you can do this in your IDE or system-wide.

Building a Service Class for OpenAI Calls

All the code to interact with OpenAI will be contained in a Spring service class which we can use across our application. Create a new class, OpenAIService.java:

@Service
public class OpenAIService {

    private static final String OPENAI_API_URL = "https://api.openai.com";

    @Value("${openai.api.key}")
    private String OPENAI_API_KEY;

}

We inject the API key environment variable into our service class with the @Value annotation.

Setting up Spring WebClient for OpenAI API Calls

Next, we will utilize Spring WebClient to make REST service calls to the OpenAI API:

private WebClient webClient;

@PostConstruct
void init() {
    var client = HttpClient.create().responseTimeout(Duration.ofSeconds(45));
    this.webClient = WebClient.builder()
            .clientConnector(new ReactorClientHttpConnector(client))
            .baseUrl(OPENAI_API_URL)
            .defaultHeader("Content-Type", MediaType.APPLICATION_JSON_VALUE)
            .defaultHeader("Authorization", "Bearer " + OPENAI_API_KEY)
            .build();
}

Moderation with ChatGPT

The first type of request we will handle is moderation requests, which are used to screen messages for content that violates OpenAI's terms:

public Mono<Boolean> moderate(List<ChatCompletionMessage> messages) {
    return Flux.fromIterable(messages)
            .flatMap(this::sendModerationRequest)
            .collectList()
            .map(moderationResponses -> {
                boolean hasFlaggedContent = moderationResponses.stream()
                        .anyMatch(response -> response.getResults().get(0).isFlagged());
                return !hasFlaggedContent;
            });
}

@RegisterReflectionForBinding({ModerationRequest.class, ModerationResponse.class})
private Mono<ModerationResponse> sendModerationRequest(ChatCompletionMessage message) {
    return webClient.post()
            .uri("/v1/moderations")
            .bodyValue(new ModerationRequest(message.getContent()))
            .retrieve()
            .bodyToMono(ModerationResponse.class);
}

These methods moderate the entire chat history by sending each ChatCompletionMessage to the moderation API. If any message is flagged, it returns false. You can find all the Java classes used in these requests and responses in the project GitHub repository.

Generating Embedding Vectors

The second type of request we'll handle is to retrieve an embedding vector for a given text. This embedding vector is used for similarity searches. Here is how to implement it:

public Mono<List<Double>> createEmbedding(String text) {

    Map<String, Object> body = Map.of(
            "model", "text-embedding-ada-002",
            "input", text
    );

    return webClient.post()
            .uri("/v1/embeddings")
            .bodyValue(body)
            .retrieve()
            .bodyToMono(EmbeddingResponse.class)
            .map(EmbeddingResponse::getEmbedding);
}

This method calls the embeddings API and returns a vector of double values that we can use to perform a similarity search in a vector database.

Streaming ChatGPT Responses

The final request type is to call the chat completion API, aka ChatGPT. To handle potentially long response times, we will stream the API response, displaying the answer as it is generated:

public Flux<String> generateCompletionStream(List<ChatCompletionMessage> messages) {

    return webClient
            .post()
            .uri("/v1/chat/completions")
            .bodyValue(Map.of(
                    "model", "gpt-3.5-turbo",
                    "messages", messages,
                    "stream", true
            ))
            .retrieve()
            .bodyToFlux(ChatCompletionChunkResponse.class)
            .onErrorResume(error -> {

                // The stream terminates with a `[DONE]` message, which causes a serialization error
                // Ignore this error and return an empty stream instead
                if (error.getMessage().contains("JsonToken.START_ARRAY")) {
                    return Flux.empty();
                }

                // If the error is not caused by the `[DONE]` message, return the error
                else {
                    return Flux.error(error);
                }
            })
            .filter(response -> {
                var content = response.getChoices().get(0).getDelta().getContent();
                return content != null && !content.equals("\n\n");
            })
            .map(response -> response.getChoices().get(0).getDelta().getContent());
}

The source code for the completed application is available on GitHub.

What's Next?

In the next part of our Building an AI chatbot in Java series, we'll discuss Using a Pinecone vector database with Spring Boot.