Overview

I created a program to calculate edit distance for TEI/XML files containing app elements.

You can use it from the following Google Colab notebook:

https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/編集距離を算出するプログラム.ipynb

Upload an XML file and the program will calculate the similarity between witnesses.

Example

Let's upload the following XML file:

The result is an Excel file like the following, which provides an overview of the similarity between witnesses.

indexname1name2distanceratio
0中村式五十音中村式五十音又様100.85
1中村式五十音中村式五十音欠損本70.8947368421052632
2中村式五十音又様中村式五十音欠損本80.868421052631579

The following library is used for calculating similarity:

Summary

There is room for further consideration on text comparison methods, but I hope this serves as a useful reference as an example of quantitative comparison between witnesses.

Reference

The functionality has also been added to the "program for extracting differences between two texts" introduced below: