USING LARGE LANGUAGE MODELS FOR SHORT TEXT TOPIC MODELING: MODEL CHOICE AND SAMPLE SIZE

Lille My
Nov 3, 2024
2 min read

Imagine you're trying to organize thousands of comments about a product into main themes or topics. Traditionally, researchers either had to read through everything manually (time-consuming and expensive) or use older computer programs that often missed important context. A new preprint from OSF shows that modern AI systems like GPT-4, Claude, and Gemini can do this job effectively, especially with short pieces of text.

The researchers conducted two interesting experiments:

Study 1: Chatbot Perception Study

They collected responses from 199 people about what makes chatbots seem human-like
They compared three different ways of analyzing the responses:
Human analysis (a research assistant reading everything)
Traditional computer analysis (using a method called LDA)
Modern AI analysis (using GPT-4 and Claude)

Result: The AI systems matched human analysis 90% of the time, while the traditional computer method only achieved 60% accuracy.

Study 2: Vaccine Hesitancy Study

They analyzed 10,000 tweets about COVID-19 vaccine concerns
They tested if AI could identify the main topics using different sample sizes
They compared three different AI systems

Result: AI performed just as well with only 5% of the data as it did with 100%, achieving 90% accuracy.

Practical Guidelines for Using AI in Topic Analysis

If you're interested in using AI for analyzing text data, here's a step-by-step guide:

Prepare Your Data
Collect your text data in a clean format
Remove any sensitive or identifying information
Choose Your AI Tool
For small to medium projects: GPT-4o or Claude 3.5 Sonnet
For large projects (over 100,000 words): Gemini Pro 1.5
Consider using multiple AI tools for cross-validation
Sample Size Strategy
Start with a small sample (around 5-10% of your data)
If your dataset is very large, you might not need to analyze everything
Use random sampling to ensure representation
Writing Effective Prompts
Be specific in your instructions
Example prompt: "You are a qualitative researcher. Read this text and identify 10 main topics. Each topic should contain a name and definition. Only return the topic."
Keep the temperature setting at default (usually 0.25-0.5)
Validation Process
Compare results from different AI tools
Have a human expert review the AI-identified topics
Look for consistency in the topics identified
Quality Control
Double-check unusual or unexpected topics
Verify that the AI hasn't missed any obvious themes
Keep track of any patterns the AI consistently misses

Important Considerations

Human Oversight
Don't rely solely on AI - use it as a helpful assistant
Have subject matter experts review the results
Be prepared to adjust topics based on human insight
Limitations
AI might miss cultural nuances
Some topics might be combined or oversimplified
AI can't replace human understanding of context
Cost-Efficiency
Using AI can be more cost-effective than hiring multiple human coders
Small samples can give reliable results, saving processing time and costs
Consider the trade-off between different AI services' costs and capabilities

The Future of Text Analysis

This research suggests that AI can revolutionize how we analyze text data, making it faster and more accessible while maintaining high accuracy. However, the best results come from combining AI efficiency with human expertise and oversight.

For researchers, businesses, and organizations dealing with large amounts of text data, this approach offers a practical way to understand themes and patterns in their data without getting overwhelmed by the volume of information.

Full paper

Generative AI for Research Initiative

USING LARGE LANGUAGE MODELS FOR SHORT TEXT TOPIC MODELING: MODEL CHOICE AND SAMPLE SIZE

Recent Posts

Comentarios