Status report: Talmud
After Tanach, the most common question we get about content is about the Talmud. Our Talmud text comes from WikiSouce, and we’ve been correcting it to ensure it matches the text of the Vilna shas. We also realized that given the number of mefarshim we plan on having, an amud was simply too large a unit of measure to reasonably use. A single amud might contain 100 comments from Rashi Tosafot, the Rosh, and the other major commentators. Without breaking up the amud into smaller units, there’s no way to know which subset of those 100 commentaries to display. When you click on a pasuk in the Torah, you see all the commentaries on that pasuk in Torah. We wanted something similar here - when you click on a sentence in the Talmud, we wanted to display the relevant commentaries on that sentence. The conclusion was clear - we needed a way to break up the dapim. Thankfully Koren Publishers graciously allowed us to use their punctuation to break up the amud. Each line of the amud now corresponds to a grammatical phrase (not a line of the Vilna printing). We undertook a massive project to segment all of shas in this manner, and finished in the fall of 2014. In the process we also double checked the text we had from wikisource to make sure it matched the Vilna shas.
Next up of course is the commentaries of Rashi and Tosafot. Now that the Talmud was segmented, we needed to make sure to associate each comment with the appropriate line. One of our wonderful volunteer developers Noah Santacruz made a commentary poster - a program that looks at the dibur hamatchil and tries to place the comment in the right place. Unfortunately, it cannot place every comment based solely on that information, as sometimes the text of the dibur hamatchil will appear multiple times in a daf, or it might not match at all if there are roshei teivot in use, or the commentator decides to abbreviate the text in some other fashion. To fix those, we’ve had people going through manually learning the appropriate masechtot and placing the commentaries where they belong. At the same time they’ve also been checking the contents of the comments against the Vilna shas to make sure our text is accurate. So far we’ve finished Brachot and Taanit, and Megillah and Kiddushin are in progress. This is still an ongoing process and while we’re looking for ways to improve our automation, we’re also looking for volunteers. If you, your chevruta, or your school group is learning Talmud and wants to help out the cause of Talmud learning on the internet, you could help by placing the missing commentaries in the right place as you learn. If you’re interested please let us know and we’ll help to get you started.
Throughout this process we’ve been checking the text of the Talmud and the commentaries. We’ve found a significant number of mistakes and typos, most of which have been copied over and over again by countless websites. One of the advantages to our system is that we’re able to spot and correct these errors quickly and easily. Sefaria currently has the most accurate Talmud text freely available on the internet today (using the Vilna shas as the standard), and when we’re done we will have the most accurate copies of Rashi and Tosafot too.
After Rashi and Tosafot of course come the other major commentaries. We’ve recently finished digitizing the Rosh and the nosei keilim there. We’ve also acquired digitized versions of the Pnei Yehoshua, Yad Ramah, Ramban, Shita Mekubetzet, Rashba, and Tosafot Rid. So far we’ve done Shita Mekubetzet on Brachot. While getting these into our system is difficult for the same reasons as Rashi and Tosafot, you can expect to start seeing all these commentaries appearing on Sefaria starting in a few months.
After we finished digitizing the Rosh, we are now working on digitizing the Rif and the major commentaries on the Rif. It will take us some time to digitize the while thing, but the main text of the Rif should start to appear in March with the commentaries coming later.
Lastly, we’re also working on a few other features that should be helpful to people including an integrated dictionary with data from Jastrow and the Comprehensive Aramaic Lexicon, as well as a way of integrating the mesorat hashah and Ein Mishpat Ner Mitzvah.