Skip to main content

Research Repository

Advanced Search

Document analysis of PDF files: methods, results and implications

Lovegrove, William S.; Brailsford, David F.

Authors

William S. Lovegrove

David F. Brailsford



Contributors

David F. Brailsford
Editor

Richard K. Furuta
Editor

Abstract

A strategy for document analysis is presented which uses Portable Document Format (PDF the underlying file structure for Adobe Acrobat software) as its starting point. This strategy examines the appearance and geometric position of text and image blocks distributed over an entire document. A blackboard system is used to tag the blocks as a first stage in deducing the fundamental relationships existing between them. PDF is shown to be a useful intermediate stage in the bottom-up analysis of document structure. Its information on line spacing and font usage gives important clues in bridging the semantic gap between the scanned bitmap page and its fully analysed, block-structured form. Analysis of PDF can yield not only accurate page decomposition but also sufficient document information for the later stages of structural analysis and document understanding.

Journal Article Type Article
Publication Date Jan 1, 1995
Journal Electronic Publishing -- Origination, Dissemination and Design
Peer Reviewed Peer Reviewed
Volume 8
Issue 3
APA6 Citation Lovegrove, W. S., & Brailsford, D. F. (1995). Document analysis of PDF files: methods, results and implications
Keywords Document analysis, Document understanding, Blackboard methods, Geometric structure, Logical structure, PDF, PostScript
Copyright Statement Copyright information regarding this work can be found at the following address: http://eprints.nottingh.../end_user_agreement.pdf
Additional Information Copyright transferred from John Wiley to Univ.of Nottingham in 1998.

Files

stasis.pdf (112 Kb)
PDF

Copyright Statement
Copyright information regarding this work can be found at the following address: http://eprints.nottingham.ac.uk/end_user_agreement.pdf





Downloadable Citations

;