Staff Spotlight: Noah Santacruz

Software Engineer Noah Santacruz recently completed his Masters Degree in Natural Language Processing at Cooper Union. His thesis? “Part of Speech Handling for Aramaic in Talmud.” Or, of course, “PSHAT.” Generally, when someone reads a sentence, they understand the parts of speech for each word based not on lexicon alone, but on contextual cues. For example, when reading the sentence, “I read a book yesterday,” an English speaker knows that the word “read” was in the past-tense, because it happened yesterday, even though it would be spelled the same in the present tense. The reader also understands the word “book” is a noun, because the sentence would make no sense if it was a verb, as in, “to book a flight.”

Computers use context cues to parse texts, too. However, when it comes to Talmud, computers (and often people as well) run into trouble parsing sentences. For one thing, the Talmud is written in a mix of Hebrew and Aramaic. For another, it doesn’t use a standard sentence structure words flow together in the two languages see what we did there?

Noah’s algorithm uses neural networks that divide the Talmud into Hebrew and Aramaic sections. It then analyzes the Aramaic sections and parses the parts-of-speech of each word, using the isolated passages and an available lexicon (his research focused on the often less understood Aramaic). His research provides scholars a better understanding of just how the Talmud was written and edited as it separates out the layers of Aramaic and Hebrew, recognizes different writing and speaking styles, and identifies certain repetitions of keywords. It also makes it easier for educators and students to break up passages of the Talmud by questions asked, answers offered, and rejections of those answers - all automatically.

Noah thanks the Comprehensive Aramaic Lexicon (CAL) for providing the dataset and Dicta for advising him throughout.