| Dotplot Applications |
|
| Alignment | |
![]() A manual chapter in Dutch, French, German, Italian, Spanish, and Swedish One million 4-grams |
A combination of squares and diagonals identifies translations. The dark squares on the main diagonal are formed by tokens matching within the same language. The diagonal texture is formed by names and numbers that are the same in each language. The diagonals identify alignments between translations. An alignment function may be created by fitting the points along the diagonals. The alignment function matches a position in one document to the corresponding position in the translation. Alignments are used to construct multi-lingual concordances for terminology research. |
![]() A multi-language text editor with an interactive alignment plot |
Alignments are also useful for multi-language text editors. We have extended the Emacs text editor to maintain correspondences and identify anomalies between translations. Scrolling or highlighting in one document will scroll or highlight the corresponding region of the other document. Selecting positions on the plot that have a large slope will often identify discrepancies in the translations. |
| Version Identification | |
![]() Two versions of dix (8000 lines of C) Two versions of xmh (20,000 lines of C) |
Large systems or documents may span several files. Two versions of a multi-file systems will only appear as diagonals if the files of each version are in the same relative order. Here the two versions of xmh are in the same relative order, but the two versions of dix are not. When reordered diagonals appear in dotplots of software versions, they usually indicate that file names have changed between versions. |
![]() Determining file pairs for re-translation |
In some cases it is useful to automatically reorder sequences. Re-translation is a service offered by AT&T Business Translation to simplify translating a new version of a previously translated document. Only the differences between the versions need to be translated. The file names and boundaries are typically different in the new version, however, so the file pairs must be determined before the differences can be identified. Diagonals can be reconstructed by an image processing algorithm that identifies the clearest diagonal on a strip of grid boxes. Automatic diagonal reconstruction identifies file pairs in reordered versions, which is a crucial first step in comparing versions of large systems or documents that have diverged over time. |
dotplot · overview · interpretation · application · gallery · documentation
| |
web media software |
Copyright © 2000-2004 Jonathan
Helfman