Thursday, January 26, 2017

ML Tries to Cracks Indus Valley Script

In order to decipher the Indus script, it’s important to ascertain what we’re looking at — whether the symbols stand for a language, or, like totem poles or coats of arms, just representations of things like family names or gods. “Given the amount of data we have, we cannot make any firm statement regarding the content of the script,” says Yadav. “I think what we’ve done is try to piece together whatever evidence we have to see if it leads us one way or the other,” says Rao. “And I think, at least from the work we’ve done, it seems like it’s more tailed towards the language hypothesis than not.” Most scholars tend to agree.

In 2009, Rao published a study that examined the sequential structure of the Indus script, or how likely it is that particular symbols follow or precede other symbols. In most linguistic systems, words or symbols follow each other in a semi-predictable manner. There are certain dictating sentence structures, but also a fair amount of flexibility. Researchers call this semi-predictability “conditional entropy.” Rao and his colleagues calculated how likely it was that one symbol followed another in an intentional order. “What we were interested in was if we could deduce some statistical regularities or structure,” says Rao, “basically ruling out that these symbols were just juxtapositions of symbols and that there were actually some rules or patterns.”

They compared the conditional entropy of the Indus script to known linguistic systems, like Vedic Sanskrit, and known nonlinguistic systems, like human DNA sequences, and found that the Indus script was much more similar to the linguistic systems. “So, it’s not proof that the symbols are encoding a language but it’s additional evidence hinting that these symbols are not just random juxtapositions of arbitrary symbols,” says Rao, “and they follow patterns that are consistent with the those you would you expect to find if the symbols are encoding language.”

In a subsequent paper, Rao and his colleagues took all of Indus’ known symbols and looked at where they fell within the inscriptions they were found in. This statistical technique, known as a Markov model, was able to pinpoint specifics like which symbols were most likely to begin a text, which were most likely to end it, which symbols were likely to repeat, which symbols often pair together, and which symbols tend to precede or follow a particular symbol. The Markov model is also useful when it comes to incomplete inscriptions. Many artifacts are found damaged, with parts of the inscription missing or unreadable, and a Markov model can help fill in those gaps. “You can try to complete missing symbols based on the statistics of other sequences that are complete,” explains Rao.


Providing anthropological and archaeological context to the artifacts we do have would also help further our understanding of the script. Gabriel Recchia, a research associate at the Cambridge Centre for Digital Knowledge at the University of Cambridge, published a method that aimed to do just that. In previous cognitive science studies, he and his colleagues showed that you can estimate the distances between cities by how often they’re mentioned together in writing. This was true for US cities based on their co-occurrences in national newspapers, Middle Eastern and Chinese cities based on Arabic and Chinese texts, and even cities in The Lord of the Rings. Recchia applied that idea to the Indus script, taking symbols from artifacts whose origins were known and using them to predict where artifacts of unknown origin with similar symbols came from. Recchia explains that a version of this method that takes into account much more detailed information could be very useful. “There are significant differences between artifacts that appear in different sublocations within a site and this is what is much more frequently unknown and in many cases, could provide more useful information,” says Recchia. “Was this found in a garbage heap along with a number of other seals or was this something that was imported from elsewhere?”

- More Here

No comments: