Dawn Knight
CANELC: constructing an e-language corpus
Knight, Dawn; Adolphs, Svenja; Carter, Ronald
Authors
SVENJA ADOLPHS SVENJA.ADOLPHS@NOTTINGHAM.AC.UK
Professor of English Language and Linguistics
Ronald Carter
Abstract
This paper reports on the construction of CANELC: the Cambridge and Nottingham e-language Corpus.3 CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database.
This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC - British National Corpus).
3 CANELC stands for Cambridge and Nottingham e-language Corpus. This corpus has been built as part of a collaborative project between The University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, tweets, discussion board content and private/business emails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At the present time the annotated corpus is only available to authors and researchers working for CUP and is not more generally available.
Citation
Knight, D., Adolphs, S., & Carter, R. (2014). CANELC: constructing an e-language corpus. Corpora, 9(1), https://doi.org/10.3366/cor.2014.0050
Journal Article Type | Article |
---|---|
Acceptance Date | May 1, 2013 |
Publication Date | May 1, 2014 |
Deposit Date | Sep 1, 2016 |
Publicly Available Date | Sep 1, 2016 |
Journal | Corpora |
Print ISSN | 1749-5032 |
Electronic ISSN | 1755-1676 |
Publisher | Edinburgh University Press |
Peer Reviewed | Peer Reviewed |
Volume | 9 |
Issue | 1 |
DOI | https://doi.org/10.3366/cor.2014.0050 |
Keywords | Blogs, Tweets, SMS, Discussion boards, e-language, Corpus linguistics |
Public URL | https://nottingham-repository.worktribe.com/output/995856 |
Publisher URL | http://dx.doi.org/10.3366/cor.2014.0050 |
Related Public URLs | http://www.euppublishing.com/doi/abs/10.3366/cor.2014.0050 |
Additional Information | This article has been accepted for publication by Edinburgh University Press in Corpora. Copyright © 2016. Edinburgh University Press. |
Contract Date | Sep 1, 2016 |
Files
Knight, Adolphs, and Carter_CANELC constructing an e-language corpus.pdf
(517 Kb)
PDF
You might also like
CANELC: constructing an e-language corpus
(2014)
Journal Article
A multimodal approach to assessing user experiences with agent helpers
(2016)
Journal Article
All hands on deck . Negotiation over gesture forms in collaborative discourse
(2018)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search