Welcome To AI news, AI trends website

AI Breakthrough: Machine Learning Algorithm Translates Undeciphered Ancient Languages

AI Breakthrough: Machine Learning Algorithm Translates Undeciphered Ancient Languages
AI Breakthrough: Machine Learning Algorithm Translates Undeciphered Ancient Languages

Recent linguistic studies reveal that the majority of languages throughout human history have vanished from use. Among these extinct tongues, dozens remain completely undeciphered — meaning researchers lack sufficient understanding of their grammatical structures, vocabulary, or syntax to comprehend their written texts.

The significance of lost languages extends far beyond academic interest; they represent entire cultural knowledge systems and historical records of civilizations. Unfortunately, most these languages possess such limited documentation that conventional machine-translation systems like Google Translate cannot process them. Many lack well-documented linguistic relatives for comparison, and frequently omit traditional text separators such as spaces and punctuation. (To understand the challenge, imaginetryingtoreadanunfamiliarlanguagewrittenlikethis.)

However, scientists at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have achieved a remarkable breakthrough: an innovative system capable of automatically decoding lost languages without requiring prior knowledge of their relationships to other languages. Their research demonstrates that this AI system can independently identify linguistic connections between languages, and they successfully applied it to verify recent scholarship suggesting that Iberian is not related to Basque as previously thought.

The research team's ultimate objective is to develop a system capable of deciphering languages that have baffled linguists for decades, using only a few thousand words of text.

Led by MIT Professor Regina Barzilay, this AI machine learning for ancient language translation system incorporates several principles derived from historical linguistics. These include recognizing that languages evolve according to predictable patterns. For instance, while languages rarely add or remove entire sounds, certain sound substitutions commonly occur. A word containing a "p" sound in a parent language might transform into a "b" in a descendant language, whereas a change to "k" would be less probable due to the significant phonetic difference.

By integrating these linguistic constraints, Barzilay and MIT PhD student Jiaming Luo created a groundbreaking decipherment algorithm capable of navigating the vast landscape of possible transformations despite the scarcity of reference data. The algorithm learns to map linguistic sounds into a multidimensional space where pronunciation differences are reflected in the distance between corresponding vectors. This approach enables the system to identify relevant patterns of language evolution and express them as computational constraints. The resulting model can segment words in ancient languages and match them to counterparts in related languages.

This project expands upon previous research by Barzilay and Luo that successfully deciphered the extinct languages of Ugaritic and Linear B, the latter of which had required decades of human effort to decode previously. However, a crucial distinction was that in that earlier work, the researchers knew these languages were connected to early forms of Hebrew and Greek, respectively.

With this new system for deciphering lost languages with artificial intelligence, the relationships between languages are determined algorithmically rather than through prior knowledge. This represents one of the most significant challenges in linguistic decipherment. In the case of Linear B, it took researchers several decades to identify its correct modern descendant. For Iberian, scholars still debate its linguistic connections: some argue for a relationship with Basque, while others contend that Iberian has no connection to any known language.

The MIT CSAIL AI language decipherment technology can evaluate the proximity between languages; when tested with known languages, it can accurately identify language families. The team applied their algorithm to Iberian, considering Basque as well as less probable candidates from Romance, Germanic, Turkic, and Uralic language families. While Basque and Latin showed closer relationships to Iberian than other languages, they remained too dissimilar to be considered linguistically related.

In future research, the team aims to extend their work beyond simply connecting texts to related words in a known language — an approach called "cognate-based decipherment." This method assumes the existence of a known related language, but the Iberian example demonstrates this isn't always possible. Their new approach would involve identifying the semantic meaning of words even without knowing how to pronounce or read them.

"For example, we might identify all references to individuals or places within a document, which could then be examined in the context of existing historical evidence," explains Barzilay. "These 'entity recognition' techniques are widely used in modern text processing applications and achieve high accuracy, but the critical research question is whether this task is feasible without any training data in the ancient language."

The project received partial funding from the Intelligence Advanced Research Projects Activity (IARPA).

tags:AI machine learning for ancient language translation deciphering lost languages with artificial intelligence MIT CSAIL AI language decipherment technology neural machine learning for undeciphered languages AI algorithms for historical language translation
This article is sourced from the internet,Does not represent the position of this website
justmysocks
justmysocks

Friden Link