5 Matching Annotations
  1. Nov 2015
  2. Sep 2015
  3. arxiv.org arxiv.org
    1. We used the following publicly available real datasets in the experiment

      Datasets used are DBPL, ENRON, UNIREF-4GRAM. All small (<1M records) in web terms and I would guess, all with small document sizes.

      Given a lengthy paper, could potentially divide into smaller documents (1 doc === 1 page) and do signature calculation on a per-page basis. This could have the benefit of bounding the search time by limiting the number of pages that need to be rendered to text in order to start the lookup process.