Skip to main content

Research Repository

Advanced Search

Ethics considerations for Corpus Linguistic studies using internet resources

Koene, Ansgar; Adolphs, Svenja; Perez, Elvira; Carter, Chris James; Statache, Ramona; O'Malley, Claire; Rodden, Tom; McAuley, Derek


Ansgar Koene

Professor of English Language and Linguistics

Profile Image

Assistant Professor in Entrepreneurship and Innovation

Ramona Statache

Claire O'Malley

Professor of Computer Science

Profile Image

Professor of Digital Economy


With the rising popularity of public and semi-public communication channels such as Blogs (late 1990s), Wikipedia (launched in 2001), Facebook (launched in 2004), Reddit (from 2005) and Twitter (from 2006), the Internet has become an increasingly fertile medium through which to collect substantial data sets of written language. Additional features that make online communication platforms attractive include the comparatively low effort and cost associated with data collection and the unobtrusive nature of the collection process, which can often be performed ‘behind the scenes’ using application programme interfaces (APIs) or web scraping techniques, depending upon the affordances of the specific type of social media studies (e.g. Twitter, Blogs). While the unobtrusive nature of the methods offers the advantage of ensuring that observed conversations are not unduly influenced by the researcher, it raises ethical concerns around issues of privacy violation, informed consent and the right to withdraw.
In this paper we will discuss some of the ethical concerns around the use of online communications data. We will start by looking at the current guidelines by the British Association for Applied Linguistics (BAAL). Next we will discuss some of the core difficulties related to identifying ‘publicness’ of Internet-based information. This will lead to a discussion about ethical responsibilities when dealing with ‘public’ online communications, and how this issue is being addressed in current corpus linguistics research.


Koene, A., Adolphs, S., Perez, E., Carter, C. J., Statache, R., O'Malley, C., …McAuley, D. (2015). Ethics considerations for Corpus Linguistic studies using internet resources. In Corpus Linguistics 2015: abstract book (204-206)

Conference Name Corpus Linguistics 2015
Conference Location Lancaster, UK
Start Date Jul 21, 2015
End Date Jul 24, 2015
Acceptance Date Feb 28, 2015
Online Publication Date Jul 21, 2015
Publication Date Jul 21, 2015
Deposit Date Mar 29, 2021
Publicly Available Date Apr 9, 2021
Pages 204-206
Book Title Corpus Linguistics 2015: abstract book
Public URL
Publisher URL
Related Public URLs