Skip to main content

Research Repository

Advanced Search

Spelling errors and keywords in born-digital data: a case study using the Teenage Health Freak Corpus

Smith, Catherine; Adolphs, Svenja; Harvey, Kevin; Mullany, Louise

Authors

Catherine Smith

SVENJA ADOLPHS svenja.adolphs@nottingham.ac.uk
Professor of English Language and Linguistics



Abstract

The abundance of language data that is now available in digital form, and the rise of distinct language varieties that are used for digital communication, means that issues of non-standard spellings and spelling errors are, in future, likely to become more prominent for compilers of corpora. This paper examines the effect of spelling variation on keywords in a born-digital corpus in order to explore the extent and impact of this variation for future corpus studies. The corpus used in this study consists of e-mails about health concerns that were sent to a health website by adolescents. Keywords are generated using the original version of the corpus and a version with spelling errors corrected, and the British National Corpus (BNC) acts as the reference corpus. The ranks of the keywords are shown to be very similar and, therefore, suggest that, depending on the research goals, keywords could be generated reliably without any need for spelling correction.

Journal Article Type Article
Publication Date Nov 1, 2014
Journal Corpora
Print ISSN 1749-5032
Electronic ISSN 1755-1676
Publisher Edinburgh University Press
Peer Reviewed Peer Reviewed
Volume 9
Issue 2
APA6 Citation Smith, C., Adolphs, S., Harvey, K., & Mullany, L. (2014). Spelling errors and keywords in born-digital data: a case study using the Teenage Health Freak Corpus. Corpora, 9(2), https://doi.org/10.3366/cor.2014.0055
DOI https://doi.org/10.3366/cor.2014.0055
Keywords Computer mediated communication, Keyword analysis, Spelling variation
Publisher URL http://dx.doi.org/10.3366/cor.2014.0055
Related Public URLs http://www.euppublishing.com/doi/abs/10.3366/cor.2014.0055
Copyright Statement Copyright information regarding this work can be found at the following address: http://eprints.nottingh.../end_user_agreement.pdf
Additional Information This article has been accepted for publication by Edinburgh University Press in Corpora.

Files

Smith, Adolphs, Harvey, and Mullany_Spelling Errors and Keywords in Born-Digital Data.pdf (328 Kb)
PDF

Copyright Statement
Copyright information regarding this work can be found at the following address: http://eprints.nottingham.ac.uk/end_user_agreement.pdf





You might also like



Downloadable Citations

;