Topic Modeling Slave Narratives

A classroom-friendly guide to BERTopic, local models, topic labels, and interpretation

The North American Slave Narratives collection, hosted by Documenting the American South at the University of North Carolina, preserves over 290 first-person accounts of enslaved life in the United States. Topic modeling gives us a way to search for recurring themes across the collection without reading every document from start to finish — and then to ask whether those themes shift across the decades in which the narratives were published.

This site introduces that workflow and is designed for students who may have little or no coding experience.

Start Here

The Simple Explanation and Hands-On Exercise both use a 10% sample of the collection — 29 documents drawn at random from 294.

Simple Explanation

Learn the project in plain language: data, chunks, preprocessing, BERTopic, local LLM steps, topic labels, outliers, and human interpretation. Includes all visualizations and result tables for the 10% sample.

Open the simple explanation

Hands-On Exercise

Download the original dataset, install the tools, run the scripts on your own computer, and interpret the outputs.

Open the hands-on exercise

More Analyses

The More Analyses tab in the navigation bar contains two additional runs for comparison:

  • Second 10% Sample — a different 29-document sample drawn with a different random seed, with zero overlap with the first sample. Compare topics across the two runs to see which themes are stable and which differ.
  • Full Collection — the complete pipeline run on all 294 documents, producing 439 topics. Shows which themes appear only in specific documents and which recur across the entire collection.

Slides

The Slides tab contains presentation slides covering the same content as the Simple Explanation.