Skip to main content

Research Repository

Advanced Search

Web Scraping Using R

Bradley, Alex; James, Richard J. E.

Web Scraping Using R Thumbnail


Authors

Alex Bradley



Abstract

The ubiquitous use of the Internet in daily life means that there are now large reservoirs of data that can provide fresh insights into human behavior. One of the key barriers preventing more researchers from utilizing online data is that they do not have the skills to access the data. This Tutorial addresses this gap by providing a practical guide to scraping online data using the popular statistical language R. Web scraping is the process of automatically collecting information from websites. Such information can take the form of numbers, text, images, or videos. This Tutorial shows readers how to download web pages, extract information from those pages, store the extracted information, and do so across multiple pages of a website. A website has been created to assist readers in learning how to web-scrape. This website contains a series of examples that illustrate how to scrape a single web page and how to scrape multiple web pages. The examples are accompanied by videos describing the processes involved and by exercises to help readers increase their knowledge and practice their skills. Example R scripts have been made available at the Open Science Framework.

Citation

Bradley, A., & James, R. J. E. (2019). Web Scraping Using R. Advances in Methods and Practices in Psychological Science, 2(3), 264-270. https://doi.org/10.1177/2515245919859535

Journal Article Type Article
Acceptance Date Jun 3, 2019
Online Publication Date Jul 30, 2019
Publication Date 2019-09
Deposit Date Sep 24, 2019
Publicly Available Date Mar 29, 2024
Journal Advances in Methods and Practices in Psychological Science
Print ISSN 2515-2459
Electronic ISSN 2515-2467
Publisher SAGE Publications
Peer Reviewed Peer Reviewed
Volume 2
Issue 3
Pages 264-270
DOI https://doi.org/10.1177/2515245919859535
Public URL https://nottingham-repository.worktribe.com/output/2471042
Publisher URL https://journals.sagepub.com/doi/10.1177/2515245919859535

Files




You might also like



Downloadable Citations