| Dotplot Visualization | Technique |
![]() Six Words of Shakespeare |
Dots off the main diagonal indicate matches. To identify matches in millions of tokens, the technique is extended by adding weighting, reconstruction, and approximation methods. |
![]() A Million Words of Shakespeare |
Weighting prevents matches between frequent tokens from saturating the plot a typical weighting function uses the inverse of a token's frequency. Reconstruction methods facilitate scaling by accumulating matches from multiple tokens in a single pixel. An approximation that allows plots to be created at nearly interactive rates, is to not plot tokens with small weights. Optional grid lines show the boundaries between input files. |
![]() Three years of Canadian Parliamentary debates in English and French 37 Million words |
Grand
Scale |
![]() Two DNA Sequences 7000 Nucleotides |
Biology |
![]() Texture of Repeated Data Initializations in 300 Lines of C Code |
Similarity
Structures There are algorithmic approaches to identifying string-matches in textual sequences, such as longest-common-substrings, suffix-trees, or the dynamic programming techniques used in the UNIX diff utility. Dotplots let people use their visual pattern-recognition skills to identify similarities, an approach that is typically less sensitive to noise than traditional algorithmic approaches. The texture of shrinking diagonals in the plot above is an example that would be difficult to appreciate with a text editor or with any existing algorithmic approach to detecting similarity. The texture is caused by a repeated set of 16 data structure initializations. Each time the initializations are repeated, one of the 16 assigned values is different. Dotplots reveal otherwise hidden structures in our text, code, and data. These similarity structures are used to identify copies, versions, translations, documents about similar subjects, and software modules with similar comments and symbols. |
dotplot · overview · interpretation · application · gallery · documentation
| |
web media software |
Copyright © 2000-2004 Jonathan
Helfman