Research Repository

See what's under the surface

Reconstituting typeset Marriage Registers using simple software tools

Brailsford, David F.

Authors

David F. Brailsford

Abstract

In a world of fully integrated software applications, which can seem daunting to develop and to maintain, it is sometimes useful to recall that a system of loosely-linked software components can provide surprisingly powerful and flexible methods for software development.

This paper describes a project which aims to retypeset a series of volumes from the Phillimore Marriage Registers, first published in England around the turn of the last century. The source material is plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volumes. The regular, tabular, structure of the Register pages allows us to automate the re-typesetting process.

The UNIX troff software and its tbl preprocessor are used for the typesetting itself, but a series of simple awk-based software tools, all of them parsers and code generators of one sort or another, is used to bring about the OCR-to-troff transformation.

By re-parsing the generated troff codes it is possible to
produce a surname index as a supplement to the retypeset
volume. Moreover, this second-stage parsing has been invaluable in discovering subtle ‘typos’ in the automatically generated material. With small adjustments to this parser it would be possible to output the complete marriage entries in standard XML or GEDCOM notations.

Journal Article Type Article
Publication Date May 1, 2012
Journal Computer Science - Research and Development
Print ISSN 1865-2034
Electronic ISSN 1865-2042
Publisher Humana Press
Peer Reviewed Peer Reviewed
Volume 27
Issue 2
Institution Citation Brailsford, D. F. (2012). Reconstituting typeset Marriage Registers using simple software tools. Computer Science - Research and Development, 27(2), doi:10.1007/s00450-010-0145-x
DOI https://doi.org/10.1007/s00450-010-0145-x
Keywords Re-Typesetting, OCR, Troff, Parsing, Genealogy, Hyperlinking, Indexing
Publisher URL http://link.springer.com/article/10.1007/s00450-010-0145-x
Copyright Statement Copyright information regarding this work can be found at the following address: http://eprints.nottingh.../end_user_agreement.pdf
Additional Information The final publication is available at Springer via http://dx.doi.org/10.1007/s00450-010-0145-x

Files

eprint-springer2010.pdf (477 Kb)
PDF

Copyright Statement
Copyright information regarding this work can be found at the following address: http://eprints.nottingham.ac.uk/end_user_agreement.pdf




Downloadable Citations