top of page
Writer's pictureLille My

USING LARGE LANGUAGE MODELS FOR SHORT TEXT TOPIC MODELING: MODEL CHOICE AND SAMPLE SIZE



Imagine you're trying to organize thousands of comments about a product into main themes or topics. Traditionally, researchers either had to read through everything manually (time-consuming and expensive) or use older computer programs that often missed important context. A new preprint from OSF shows that modern AI systems like GPT-4, Claude, and Gemini can do this job effectively, especially with short pieces of text.


The researchers conducted two interesting experiments:

Study 1: Chatbot Perception Study

  1. They collected responses from 199 people about what makes chatbots seem human-like

  2. They compared three different ways of analyzing the responses:

    Human analysis (a research assistant reading everything)

    Traditional computer analysis (using a method called LDA)

    Modern AI analysis (using GPT-4 and Claude)

Result: The AI systems matched human analysis 90% of the time, while the traditional computer method only achieved 60% accuracy.


Study 2: Vaccine Hesitancy Study

  1. They analyzed 10,000 tweets about COVID-19 vaccine concerns

  2. They tested if AI could identify the main topics using different sample sizes

  3. They compared three different AI systems

Result: AI performed just as well with only 5% of the data as it did with 100%, achieving 90% accuracy.


Practical Guidelines for Using AI in Topic Analysis

If you're interested in using AI for analyzing text data, here's a step-by-step guide:

  1. Prepare Your Data

  2. Collect your text data in a clean format

  3. Remove any sensitive or identifying information

  4. Choose Your AI Tool

    For small to medium projects: GPT-4o or Claude 3.5 Sonnet

    For large projects (over 100,000 words): Gemini Pro 1.5

  5. Consider using multiple AI tools for cross-validation

  6. Sample Size Strategy

    Start with a small sample (around 5-10% of your data)

    If your dataset is very large, you might not need to analyze everything

    Use random sampling to ensure representation

  7. Writing Effective Prompts

    Be specific in your instructions

    Example prompt: "You are a qualitative researcher. Read this text and identify 10 main topics. Each topic should contain a name and definition. Only return the topic."

  8. Keep the temperature setting at default (usually 0.25-0.5)

  9. Validation Process

    Compare results from different AI tools

    Have a human expert review the AI-identified topics

    Look for consistency in the topics identified

  10. Quality Control

    Double-check unusual or unexpected topics

    Verify that the AI hasn't missed any obvious themes

    Keep track of any patterns the AI consistently misses


Important Considerations

  1. Human Oversight

    Don't rely solely on AI - use it as a helpful assistant

    Have subject matter experts review the results

    Be prepared to adjust topics based on human insight

  2. Limitations

    AI might miss cultural nuances

    Some topics might be combined or oversimplified

    AI can't replace human understanding of context

  3. Cost-Efficiency

    Using AI can be more cost-effective than hiring multiple human coders

    Small samples can give reliable results, saving processing time and costs

    Consider the trade-off between different AI services' costs and capabilities


The Future of Text Analysis

This research suggests that AI can revolutionize how we analyze text data, making it faster and more accessible while maintaining high accuracy. However, the best results come from combining AI efficiency with human expertise and oversight.


For researchers, businesses, and organizations dealing with large amounts of text data, this approach offers a practical way to understand themes and patterns in their data without getting overwhelmed by the volume of information.



Comments


bottom of page