Skip to main content

Research Repository

Advanced Search

CANELC: constructing an e-language corpus

Knight, Dawn; Adolphs, Svenja; Carter, Ronald

Authors

Dawn Knight

SVENJA ADOLPHS SVENJA.ADOLPHS@NOTTINGHAM.AC.UK
Professor of English Language and Linguistics

Ronald Carter



Abstract

This paper reports on the construction of the Cambridge and Nottingham e-language Corpus (CANELC).

This corpus has been built as part of a collaborative project between the University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, Tweets, discussion board content and private/business e-mails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At present, the annotated corpus is only available to authors and researchers working for CUP and is not more generally available.
CANELC is a one-million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, e-mails and Short Message Services (SMS). The paper outlines the approaches used when planning the corpus: obtaining consent, collecting the data and compiling the corpus database.

This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used, as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation into how e-language operates in ways that are both similar to and different from spoken and written records of communication (as evidenced by the British National Corpus, BNC).

Journal Article Type Article
Acceptance Date Feb 1, 2013
Online Publication Date May 1, 2014
Publication Date May 1, 2014
Deposit Date Dec 14, 2017
Journal Corpora
Print ISSN 1749-5032
Electronic ISSN 1755-1676
Publisher Edinburgh University Press
Peer Reviewed Peer Reviewed
Volume 9
Issue 1
Pages 29-56
DOI https://doi.org/10.3366/cor.2014.0050
Keywords Blogs, Tweets, SMS, Discussion boards, E-language, Corpus linguistics
Public URL https://nottingham-repository.worktribe.com/output/1094835
Publisher URL https://www.euppublishing.com/doi/full/10.3366/cor.2014.0050