Researchers Create AI Language Model Trained Only on Pre-1931 Text

Researchers released Talkie, a 13-billion parameter AI language model trained only on English text published before 1931. The model used 260 billion tokens of historical text to create an AI that theoretically speaks like someone from the 1930s.

April 28, 20263 sources2 min read

A team of researchers has created Talkie, an artificial intelligence language model that was trained exclusively on text published before 1931. The model contains 13 billion parameters and learned from 260 billion tokens of historical English text.

The project is part of a growing trend of "vintage" AI models that try to capture how language was used in specific historical periods. Other similar projects include Ranke-4B, Mr. Chatterbox, and Machina Mirabilis.

But the Talkie project ran into a common problem with historical AI training. Despite efforts to use only pre-1931 text, the researchers admit their model suffered from "data leakage" issues. This means some modern information accidentally got mixed into the training data, giving the AI knowledge it shouldn't have for that time period.

The model is available for download and testing through GitHub and Hugging Face, allowing other researchers to experiment with vintage AI responses.

Why this matters

This shows how AI models can be designed to reflect specific time periods, which could help historians study language changes or create period-accurate content. However, it also highlights how hard it is to keep modern information from contaminating AI training.

What to watch

Researchers will likely work to fix the data contamination issues and improve methods for creating historically accurate AI models.

Sources

Talkie-lmIntroducing talkie: a 13B vintage language model from 1930

NewsTalkie: a 13B vintage language model from 1930 | Hacker News

Huggingfacetalkie-13b - a talkie-lm Collection

artificial-intelligencemachine-learninghistorical-research

This story was written with AI based on reporting from the sources above. For the complete story, visit the original sources.

Was this article helpful?

0 people found this helpful