Alex Bradley
Web Scraping Using R
Bradley, Alex; James, Richard J. E.
Abstract
The ubiquitous use of the Internet in daily life means that there are now large reservoirs of data that can provide fresh insights into human behavior. One of the key barriers preventing more researchers from utilizing online data is that they do not have the skills to access the data. This Tutorial addresses this gap by providing a practical guide to scraping online data using the popular statistical language R. Web scraping is the process of automatically collecting information from websites. Such information can take the form of numbers, text, images, or videos. This Tutorial shows readers how to download web pages, extract information from those pages, store the extracted information, and do so across multiple pages of a website. A website has been created to assist readers in learning how to web-scrape. This website contains a series of examples that illustrate how to scrape a single web page and how to scrape multiple web pages. The examples are accompanied by videos describing the processes involved and by exercises to help readers increase their knowledge and practice their skills. Example R scripts have been made available at the Open Science Framework.
Citation
Bradley, A., & James, R. J. E. (2019). Web Scraping Using R. Advances in Methods and Practices in Psychological Science, 2(3), 264-270. https://doi.org/10.1177/2515245919859535
Journal Article Type | Article |
---|---|
Acceptance Date | Jun 3, 2019 |
Online Publication Date | Jul 30, 2019 |
Publication Date | 2019-09 |
Deposit Date | Sep 24, 2019 |
Publicly Available Date | Sep 24, 2019 |
Journal | Advances in Methods and Practices in Psychological Science |
Print ISSN | 2515-2459 |
Electronic ISSN | 2515-2467 |
Publisher | SAGE Publications |
Peer Reviewed | Peer Reviewed |
Volume | 2 |
Issue | 3 |
Pages | 264-270 |
DOI | https://doi.org/10.1177/2515245919859535 |
Public URL | https://nottingham-repository.worktribe.com/output/2471042 |
Publisher URL | https://journals.sagepub.com/doi/10.1177/2515245919859535 |
Contract Date | Sep 24, 2019 |
Files
Webscraping Using R AAM
(841 Kb)
PDF
You might also like
What’s love got to do with it? Exploring social love and public health
(2024)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search