Web scraping: R vs python

For my last post, I used a python script to scrape the data from a website. I used Python..just because I am used to do webscraping in Python. But I heard R also got better at scraping, so I rewrote my script in R.

The package rvest is the equivalent of BeautifulSoup in python. It is available since 2014 and created by Hadley Wickham. Underneath it uses the packages ‘httr’ and ‘xml2’ to easily download and manipulate html content.

You can use rvest in the following way:

[code language=”r” wraplines=”true” collapse=”false”]
# install and load package
install.packages(“rvest”)
library(rvest)

url <- “http://live.ultimate.dk/desktop/front/?eventid=2021049&language=nl”
data <- read_html(url)
resultsTable <- data %>% html_nodes(“table.leaderboard_table_results”)
rows <- resultsTable %>% html_nodes(“tr”)
for(i in 1:length(rows)){
tds <- rows[i] %>% html_nodes(“td”)
print(tds[4] %>% html_text)
print(tds[10] %>% html_text)
}
[/code]

A couple of things are good to know:

get the website content with read_html(<URL>): this will return an xml document
select content from certain nodes with html_nodes(<element>.<classname>)
get attribute content from a node with html_attr(<name>)
the pipe operator “%>%” can be used to chain operations. Use it, it’s very convenient.

When you are used to BeautifulSoup, it is easy to learn rvest, because it has a similair syntax.

You can find the R and python scripts that I wrote for webscraping below. I am wondering what language you prefer for webscraping? Please let me know in a comment below.

python script vs R script

3 thoughts on “Web scraping: R vs python”

Maarten says:

May 23, 2016 at 10:38 am

Nice. Let me just add a ruby example (my favourite 😉 ), using the nokogiri library:

require 'nokogiri' require 'open-uri'
url = "http://live.ultimate.dk/desktop/front/?eventid=2021049&language=nl" data = Nokogiri::HTML(open(url))
data.css("table.leaderboard_table_results tr").each do |row| tds = row.css("td") p tds[4].text p tds[10].text end
Ger says:

May 24, 2016 at 4:11 pm

nice Maarten! I should learn Ruby:-)
Mokhtar Ebrahim says:

December 27, 2018 at 8:08 pm

Scraping the web with Python is much easier.
Thanks for the tips.

Comments are closed.

Ger Inberg

data science developer

Web scraping: R vs python

3 thoughts on “Web scraping: R vs python”