This is the repository for the source code accompanying the manuscript
A.R. Cohen and P.M.B. Vitányi, "Web Similarity in Sets of Search Terms Using Database Queries," SN Computer Science, 1, 161 (2020).
The source code is available under the MIT license, see LICENSE.txt in the repository.
questions: see https://bioimage.coe.drexel.edu or email andrew.r.cohen 'at' drexel.edu
prerequisites
the code here uses web search result counts from wikipedia, pubmed, reddit searches as the basis for a normalized (non-metric) distance measure among multisets of objects. support for google is included here, but is not considered reliable because (1) google result counts are approximate and (2) generally involve a monetary payment for api access to search result counts. the function Count/GetGoogleCount parses search result html directly to extract counts.
for wikipedia, search results are extracted from downloaded html. for pubmed and reddit results are obtained via RESTful api's that require user credential information as follows:
For pubmed, they request you provide an email address and an application id. Create variables called 'email' and 'appID' (any meaningful string) and save them in +Count/pubmed.mat.
For reddit, you must register to obtain a (free) client ID and secret. Create variables called 'client_id' and 'secret' and save them in +Count/reddit.mat.