CiteReader is a systematic process that scans documents using the Portable Document Format (PDF) to capture and link references together in a database. This citation data can be used for a variety of purposes, including:
- Identifying the most important works in a given collection.
- Analyzing patterns in citations.
- Providing hyperlinks to help researchers quickly move through a chain of research.
Systematic Process:
The CiteReader process is comprised of the following four steps:
- Capture: Scans for reference section in PDF documents.
- Verify: A workflow application that uses efficient human labor to verify captured references.
- Parse: Divides the captured references into smaller fields (author, title, etc.)
- Match: Establishes a connection between the referring and cited documents either within the database or externally.
Accuracy and Quality:
Under most conditions, it can capture and verify references stored in a "Reference" section with nearly 100% accuracy.
About 90% of verified references can be successfully parsed. Matching rates depend on the quality and size of the document collection, but hit rates over 50% and beyond are possible through the use of external databases.
CiteReader has made a significant contribution to the way scholarly research is performed and is a valuable tool for researchers worldwide.