Matthew R. B. Hardy
Mapping and Displaying Structural Transformations between XML and PDF
Hardy, Matthew R. B.; Brailsford, David F.
Authors
David F. Brailsford
Contributors
Richard Furuta
Editor
Jonathan Maletic
Editor
Ethan Munson
Editor
Abstract
Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving.
Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'.
This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window.
Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document.
Citation
Hardy, M. R. B., & Brailsford, D. F. Mapping and Displaying Structural Transformations between XML and PDF. Presented at ACM Symposium on Document Engineering (DocEng '02)
Conference Name | ACM Symposium on Document Engineering (DocEng '02) |
---|---|
End Date | Nov 9, 2002 |
Publication Date | Jan 1, 2002 |
Deposit Date | Oct 10, 2005 |
Publicly Available Date | Oct 9, 2007 |
Peer Reviewed | Peer Reviewed |
DOI | https://doi.org/10.1145/585058.585077 |
Keywords | XML, PDF, document structure transformation. |
Public URL | https://nottingham-repository.worktribe.com/output/1022844 |
Additional Information | Final draft form of paper accepted for ACM Doc. Eng. 2002 conference |
Files
structure02.pdf
(434 Kb)
PDF
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search