Catherine Smith
Spelling errors and keywords in born-digital data: a case study using the Teenage Health Freak Corpus
Smith, Catherine; Adolphs, Svenja; Harvey, Kevin; Mullany, Louise
Authors
Professor SVENJA ADOLPHS SVENJA.ADOLPHS@NOTTINGHAM.AC.UK
PROFESSOR OF ENGLISH LANGUAGE AND LINGUISTICS
Dr KEVIN HARVEY kevin.harvey@nottingham.ac.uk
ASSOCIATE PROFESSOR
Professor LOUISE MULLANY louise.mullany@nottingham.ac.uk
PROFESSOR OF SOCIOLINGUISTICS
Abstract
The abundance of language data that is now available in digital form, and the rise of distinct language varieties that are used for digital communication, means that issues of non-standard spellings and spelling errors are, in future, likely to become more prominent for compilers of corpora. This paper examines the effect of spelling variation on keywords in a born-digital corpus in order to explore the extent and impact of this variation for future corpus studies. The corpus used in this study consists of e-mails about health concerns that were sent to a health website by adolescents. Keywords are generated using the original version of the corpus and a version with spelling errors corrected, and the British National Corpus (BNC) acts as the reference corpus. The ranks of the keywords are shown to be very similar and, therefore, suggest that, depending on the research goals, keywords could be generated reliably without any need for spelling correction.
Citation
Smith, C., Adolphs, S., Harvey, K., & Mullany, L. (2014). Spelling errors and keywords in born-digital data: a case study using the Teenage Health Freak Corpus. Corpora, 9(2), https://doi.org/10.3366/cor.2014.0055
Journal Article Type | Article |
---|---|
Acceptance Date | Jan 1, 2013 |
Publication Date | Nov 1, 2014 |
Deposit Date | Aug 24, 2016 |
Publicly Available Date | Aug 24, 2016 |
Journal | Corpora |
Print ISSN | 1749-5032 |
Electronic ISSN | 1755-1676 |
Publisher | Edinburgh University Press |
Peer Reviewed | Peer Reviewed |
Volume | 9 |
Issue | 2 |
DOI | https://doi.org/10.3366/cor.2014.0055 |
Keywords | Computer mediated communication, Keyword analysis, Spelling variation |
Public URL | https://nottingham-repository.worktribe.com/output/737404 |
Publisher URL | http://dx.doi.org/10.3366/cor.2014.0055 |
Related Public URLs | http://www.euppublishing.com/doi/abs/10.3366/cor.2014.0055 |
Additional Information | This article has been accepted for publication by Edinburgh University Press in Corpora. |
Contract Date | Aug 24, 2016 |
Files
Smith, Adolphs, Harvey, and Mullany_Spelling Errors and Keywords in Born-Digital Data.pdf
(328 Kb)
PDF
You might also like
‘STFU and start listening to how scared we are’: Resisting misogny on Twitter via #NotAllMen
(2022)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search