Second 10% Sample Results
A different random sample — seed 9, 29 documents, 1,752 chunks
About This Sample
This page presents results from a second 10% sample of the collection, drawn with a different random seed. Seed 9 was chosen because it produces a sample with zero overlap with the first sample (seed 42) — all 29 documents are different. This is not guaranteed for every seed: because the script draws 29 documents at random from 294, some overlap between independently drawn samples is possible. Seed 9 happens to avoid it entirely. Comparing the two runs shows how topic modeling results can differ across samples and which themes are stable enough to appear in both.
To reproduce this run yourself, use the same script and the same command from the Hands-On Exercise, but add --seed 9 to the command. No changes to the script code are needed — --seed is a command-line flag. For example:
python -u scripts/run_bertopic_sample.py \
--seed 9 \
--output-dir outputs/my_run_seed9 \
--embedding-backend ollama \
--ollama-embedding-model nomic-embed-text \
--representation-backend ctfidf \
--clustering sensitive \
--label-backend ollama \
--ollama-model llama3.1:latestSample fraction: 10%
Random seed: 9
Documents: 29
Chunks: 1,752
Topics found: 60
Outliers: ~25%
All CSV files and HTML visualizations for this run are in the repository under:
outputs/bertopic_sample2_nomic_sensitive_lemmatized/
topic_review_table.csv
topic_labels_llm.csv
topic_assignments.csv
topic_info.csv
sample_documents.csv
visualizations/ ← 4 BERTopic charts (open in browser)
metadata_visualizations/ ← 5 metadata charts (open in browser)
The 29 sampled documents are completely different from the first sample (zero overlap):
fpn-burton-burton.xml
fpn-hortonlife-horton.xml
fpn-lane-lane.xml
fpn-mason-mason.xml
fpn-veney-veney.xml
neh-barrett-barrett.xml
neh-branham-branham.xml
neh-brinch-brinch.xml
neh-brown55-brown55.xml
neh-brownww-brown.xml
neh-clarkes-clarkes.xml
neh-delaney-delaney.xml
neh-dsmith-dsmith.xml
neh-edwardsc-edwards.xml
neh-hayden-hayden.xml
neh-henderson-henderson.xml
neh-jacksonc-jackson.xml
neh-leehf-leehf.xml
neh-mallory-mallory.xml
neh-mott-mott.xml
neh-mott26-mott26.xml
neh-nell-nell.xml
neh-parker-parker.xml
neh-pickard-pickard.xml
neh-rudd-rudd.xml
neh-slaveryillus-slaveryillus.xml
neh-story-story.xml
neh-webster-webster.xml
neh-wilkerson-wilkerson.xml
Topic Review Table (Top Topics)
The table below shows the 15 largest topics by chunk count. Download the full CSV files for all 60 topics.
| Topic | Label | Chunks |
|---|---|---|
| 0 | Farm, Bond, Scott, Cotton | 73 |
| 1 | Longing for Family and Freedom | 71 |
| 2 | Struggling for Freedom | 61 |
| 3 | Escape by Sea | 51 |
| 4 | Expressions of Sympathy and Gratitude | 47 |
| 5 | Family Relationships and Guidance | 46 |
| 6 | American Identity and Nationalism | 37 |
| 7 | The Struggle for Equal Rights and Representation | 36 |
| 8 | Fighting for Freedom and Citizenship | 35 |
| 9 | Confrontations with Slave Catchers | 33 |
| 10 | Ministerial Training and Early Ministry | 30 |
| 11 | Whipping and Plantation Punishment | 29 |
| 12 | Traveling to Work or Deliveries | 27 |
| 13 | Effective Classroom Management | 26 |
| 14 | Faith in Salvation and Heaven | 26 |
Top Topics in This Sample
Some topic labels from this sample — compare against the first sample and consider: which topics appear in both? Which appear only here? Download the full CSV for all 60 topics.
- Farm, Bond, Scott, Cotton
- Longing for Family and Freedom
- Struggling for Freedom
- Escape by Sea
- Expressions of Sympathy and Gratitude
- Family Relationships and Guidance
- American Identity and Nationalism
- The Struggle for Equal Rights and Representation
- Fighting for Freedom and Citizenship
- Confrontations with Slave Catchers
- Ministerial Training and Early Ministry
- Whipping and Plantation Punishment
- Traveling to Work or Deliveries
- Effective Classroom Management
- Faith in Salvation and Heaven
Note that similar labels — such as multiple “Whipping and Plantation Punishment” topics — can appear in the same run. These are distinct clusters the model kept separate because the passages within them differ in vocabulary or context, even though they share a broad theme. This is expected behavior, not an error. Inspecting the top words and representative passages in topic_assignments.csv will show what distinguishes them.
Download the Results
Download topic review table Download topic labels Download topic info Download topic assignments