Researchers Create AI Language Model Trained Only on Pre-1931 Text
Researchers released Talkie, a 13-billion parameter AI language model trained only on English text published before 1931. The model used 260 billion tokens of historical text to create an AI that theoretically speaks like someone from the 1930s.
A team of researchers has created Talkie, an artificial intelligence language model that was trained exclusively on text published before 1931. The model contains 13 billion parameters and learned from 260 billion tokens of historical English text.
The project is part of a growing trend of "vintage" AI models that try to capture how language was used in specific historical periods. Other similar projects include Ranke-4B, Mr. Chatterbox, and Machina Mirabilis.
But the Talkie project ran into a common problem with historical AI training. Despite efforts to use only pre-1931 text, the researchers admit their model suffered from "data leakage" issues. This means some modern information accidentally got mixed into the training data, giving the AI knowledge it shouldn't have for that time period.
The model is available for download and testing through GitHub and Hugging Face, allowing other researchers to experiment with vintage AI responses.
This shows how AI models can be designed to reflect specific time periods, which could help historians study language changes or create period-accurate content. However, it also highlights how hard it is to keep modern information from contaminating AI training.
Researchers will likely work to fix the data contamination issues and improve methods for creating historically accurate AI models.
Was this article helpful?
0 people found this helpful