Leveraging AI to Accelerate Software Engineering

Over the past few months, my team and I have been making extensive use of Artificial Intelligence (AI) – specifically Cursor IDE – in our day-to-day work. It started as a small experiment, but it quickly evolved into a deeper exploration of how Large Language Models (LLMs) influence our software development process in an enterprise environment.

We are a development team that is strongly committed to software quality, since it is what enables us to deliver business value at a sustainable pace. Extreme Programming (XP) practices, such as Test-Driven Development (TDD) and pair programming, help us keep quality high and also counteract the natural tendency of software to degrade over time.

When we started using AI – with no prior hands-on experience – we were naturally worried about how AI would impact the quality of the software that we delivered. AI might help us deliver value more quickly, but the speed should never come at the expense of software quality, as this would compromise sustainability.

Despite our concerns, we did not conduct empirical assessments to measure code quality. Instead, we relied on regular communication with stakeholders to learn about their perception of how the team’s delivery pace was evolving. We also paid close attention to the code, carefully monitoring whether it was becoming more manageable or more challenging to work with.

I will discuss how to keep – even increase – quality in the age of AI in an upcoming post. In this post, I will relax the keep-quality-high restriction. I will describe four situations where quality is less critical, or where you can expect the quality of the generated code to match the quality of your current code. In these four specific cases, LLMs can offer a significant increase in productivity with minimal concern.

Before we dive in, let’s look at a running example that will help me illustrate the key insights in each section of this post.

A running example

Let’s say that your team is building a WhatsApp chatbot for an insurance company. Whenever a user of the insurance company’s website provides their contact details, they are classified as a potential customer and the bot initiates a conversation. It typically begins with a series of profiling questions that are designed to learn more about the user’s needs and preferences.

Some of the profiling questions that the bot may ask are:

What type of insurance policy are you looking for?
Are you currently insured with any company?
What type of coverage do you have?

Other questions may be more specific, tailored to a particular type of insurance policy:

Do you own a home?
How old is your home?
What type of property do you live in?

The information gathered through these questions helps the bot suggest the most relevant insurance options to users, sometimes adapting the recommendations to better suit individual needs.

1. Throwaway code

One of the services of the WhatsApp chatbot calculates the policy price given a specific user profile. This service is deployed as an AWS Lambda and its logs are available through AWS CloudWatch.

Suppose that you need to download the CloudWatch logs of the past two months and prepare a statistical report. This is a one-off task that has been assigned to you to meet the needs of a particular stakeholder. While you could perform the task manually, the volume of data makes it impractical. Automating the process via a script is a far more efficient solution.

The key observation here is that the script will have no long-term use; therefore, code quality is not a primary concern. This is the perfect opportunity to leverage LLMs for code generation. If you describe the task in natural language and sufficient detail, you will have the script in a matter of seconds.

Keep in mind that the script may not fully meet your needs on the first try, but it will provide a solid starting point. You can then tweak the script manually, or you can also refine your prompt by adding more specific details to guide the LLM in the right direction.

2. Repetitive tasks

Suppose that a user sends a video or a text message in a WhatsApp conversation. To respond to this action, the WhatsApp chatbot must be implemented as an event-driven system. Whenever a user action occurs, the system is notified by Meta (or any other intermediary) through a WhatsApp Message Received event. In this event-driven context, most new features – at least, those triggered by incoming WhatsApp messages – are added to the system through a new event handler that invokes the appropriate use-case class.

For example, let’s say that you need to add the following feature: if the user types “stop”, you must stop sending marketing messages about new insurance offers. To implement this feature, you will create a new event handler that invokes a stop-marketing-messages use case whenever a “stop” message is received.

This new event handler may be the tenth, eleventh or n-th that you implement. Since they are all similar, if you ask an LLM to do it, the LLM will have plenty of code to reference and base their decisions on, making its output highly likely to meet your needs, both in terms of value and code quality. You don’t have to build the event handler (and its tests) from scratch – just ask the LLM to do it and the LLM will generate the code almost instantly.

3. Experimental context

Let’s say that the insurance company needs to learn about insurance claims or accidents that the user may have had in the last three years. Your team may obtain this information via a new profiling question in the WhatsApp chatbot. However, this approach raises a few concerns.

Many users may be unwilling to answer and could abandon the conversation, resulting in the loss of potential customers. Furthermore, whether users answer or not may be affected by the timing of the question. For example, a user may ignore the question if it appears at the start of the conversation, but they may respond if the question is asked later. In any case, it is essential to understand users’ behaviour to offer the best possible experience and reap maximum benefit.

Your team decides to run an A/B test. In one variant, the new question is placed at the beginning of the conversation, while in the other, the question is introduced later in the series. The goal is to determine whether the order of the questions affects user responses.

In this scenario, you know that the code you add will be removed – or, at least, modified to retain only one variant – once the test concludes.

An LLM can save you a significant amount of time. If the prompts that you used to run the A/B test are available – for example, you stored them as a plan – then you can simply ask the LLM to remove one variant or both.

4. Navigating legacy code

Imagine that you are a new member of the team and discover that the WhatsApp chatbot does not initiate conversations with all users who submit their contact information on the company’s website. In order to understand the conditions that trigger new conversations, you examine the codebase. However, you soon realize that the task will be challenging: the logic is scattered across multiple files and the code is difficult to follow.

You can ask an LLM to interpret the code for you and explain the conditions in plain English. Understanding natural language is easier than understanding code, especially if the code was not developed with readability in mind.

Note that the questions you pose to the LLM may be difficult to answer – they may require extensive reasoning and analysis of large fragments of code – but simpler questions can be just as valuable. For example, asking the LLM to explain an individual function, which typically yields a response within seconds, can be extremely helpful when navigating legacy code.

Conclusions

This post outlines four scenarios that give software engineers unique opportunities to leverage LLMs and accelerate their development process. In these scenarios, the usual concern about LLMs – the quality of AI-generated code – is far less critical, which enables substantial productivity gains.

In a subsequent post, I will step away from this “happy-path for AI” and explore a more realistic and challenging case: introducing new long-lived functionality into an existing codebase. I will discuss the key challenges involved and strategies that we can use to address them.

Leveraging AI to Accelerate Software Engineering

A running example

1. Throwaway code

2. Repetitive tasks

3. Experimental context

4. Navigating legacy code

Conclusions

Comments

More from this blog

Is Software Engineering Really Dead in the AI era?

Key Challenges of AI-Assisted Software Engineering

Characterization testing: adding tests to legacy code

Common test smells

Command Palette

A running example

1. Throwaway code

2. Repetitive tasks

3. Experimental context

4. Navigating legacy code

Conclusions

Comments

More from this blog