We’re continuously investing in our capabilities to find new solutions to social problems through AI. Laterite Labs is a series of blogs in which we share tips and tricks from our experiences working with Large Language Models (LLMs) in development research. We’ve previously written about some of the ways in which LLMs could impact the future of development research. Here we present a simple example of how to use large language models to categorize open answers.

An example of a simple research task that a LLM can do

In one of our recent education projects we asked to find out how many teachers are available to teach STEM subjects. The task was straightforward and all the information we needed was in a national database of 100,000 teachers, including the subjects they specialize in. However, the ‘subjects’ variable was not stored as a categorical variable with a discrete number of options, but rather as open text. Instead of having one unique value for everyone teaching “Maths”, we had that and “Mathematics”, “Math”, “Mahts”, “Numeracy”, and so on. Classifying answers as STEM or non-STEM subjects is not difficult, but when you’re working with over 100,000 responses and more than 10,000 unique values, the task becomes time-consuming and error-prone.

Simple but time-consuming tasks like this are great opportunities to increase efficiency with the use of LLMs.

Using LLMs for classifying answers: a process of prompting and reviewing

First, we created an Excel file with unique answers on subjects in one column, making sure not to include any personal identifiers. When working with LLM chatbots, it’s important to safeguard data privacy by avoiding sharing sensitive information. Then, we uploaded the excel sheet into ChatGPT-4 (the LLM we used for this task) and prompted it to flag answers related to STEM subjects. Afterwards, we downloaded the output file it produced.

The first output was not great. When we reviewed the generated answer, we noticed that ChatGPT mistakenly flagged “Social Science” and “Humanitarian Science” as STEM (possibly because they included the word “Science”).

In the next prompt, we specified that subjects related to social sciences shouldn’t be marked as STEM. We received a new answer and reviewed it. The first error was fixed, but now ICT was missing from the STEM category. We wrote a new prompt clarifying that our definition of STEM should also include ICT-related topics. We reviewed again and again, until, through this iterative process of reviewing and prompting, we reached a data set with no evident mistakes.

ChatGPT allowed us to complete a task that could have been an entire day of work in less than two hours, saving us a considerable amount of time. However, this does not mean that LLMs provide immediate solutions. We still had to invest time throughout the process, but our role became that of a reviewer. Reviewing is essential because as this example shows, ChatGPT can make mistakes.

How to use LLMs efficiently

Before using ChatGPT as an assistant, you should have a good understanding of how to do the task yourself. In this case, we knew how we would categorize all unique answers into STEM and non-STEM. Understanding the task allows you to give precise prompts and to evaluate the quality of the answer.

You also need sufficient information to be able to review the output. In this case, we knew the definition of STEM and which subjects should fall under that category. With this knowledge we were able to find mistakes and correct them.

The key takeaway from this example is that LLMs are a great time-saving tool when dealing with non-complex time-consuming tasks, but they don’t replace the researcher. What they do is allow the researcher to move into the role of a reviewer and steer the iterative process through which an AI-generated output is refined. The time saved in this process allows us to spend more time on complex tasks that involve a higher level of critical thinking.

This blog post was written by Ilse Peeters Salazar, Research Analyst based in our Amsterdam office.