the ncd source repo
all work (c) andrew r. cohen and paul m. b. vitanyi
contact: andrew.r.cohen@drexel.edu
unless otherwise specified, source license is GPL
refer to :
A.R. Cohen and P.M.B. Vitányi, The Cluster Structure Function, in review and arXiv ().
A.R. Cohen and P.M.B. Vitányi, Web Similarity in Sets of Search Terms Using Database Queries, SN Computer Science, 1, 161 (2020). https://doi.org/10.1007/s42979-020-00148-5. Also arXiv 1502.05957 (pdf).
A.R.Cohen and P.M.B. Vitányi, Normalized Compression Distance of Multisets with Applications, IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015 Aug;37(8):1602-14.
A.R. Cohen, F. Gomes, B. Roysam, and M. Cayouette, Computational prediction of neural progenitor cell fates. Nature Methods, 2010. 7(3): p. 213 – 218.
A.R. Cohen, C. Bjornsson, S. Temple, G. Banker and B. Roysam, Automatic Summarization of Changes in Biological Image Sequences using Algorithmic Information Theory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. 31(8): pp. 1386-1403.
usage tips
The NCD code includes both bzip and FLIF in-memory compression, and works on Windows and Linux OS. MacOS support pending.
Key functions include:
- src/MATLAB/ncdSample.m : demonstrates NCD with and without symbolization for numeric data
- src/MATLAB/NCD/NCD.m : bzip NCD
- src/MATLAB/NCD/imNCDM : FLIF multiset NCD for image data (tested with NIST digits dataset)
- src/MATLAB/Cluster.SpectralCluster : Spectral clustering based on NCD distance matrix
- src/MATLAB/NWD/NWD.m : uses pubmed or reddit (note: free api keys required for these two), or wikipedia to compute web distance