CiteReader

What is CiteReader?

CiteReader is a systematic process that scans documents using the Portable Document Format (PDF) to capture and link references together in a database. This citation data can be used for a variety of purposes, including:

  • Identifying the most important works in a given collection.
  • Analyzing patterns in citations.
  • Providing hyperlinks to help researchers quickly move through a chain of research.

Systematic Process:
The CiteReader process is comprised of the following four steps:

  • Capture: Scans for reference section in PDF documents.
  • Verify: A workflow application that uses efficient human labor to verify captured references.
  • Parse: Divides the captured references into smaller fields (author, title, etc.)
  • Match: Establishes a connection between the referring and cited documents either within the database or externally.

Accuracy and Quality:

  • Under most conditions, it can capture and verify references stored in a "Reference" section with nearly 100% accuracy.
  • About 90% of verified references can be successfully parsed.
  • Matching rates depend on the quality and size of the document collection, but hit rates over 50% and beyond are possible through the use of external databases.

CiteReader has made a significant contribution to the way scholarly research is performed and is a valuable tool for researchers worldwide.