data science

R and SQL

Lately I have been working with R in a data quality project. The project involves running business rules that are written in SQL and executed against a SQL server. I have done a little research about accessing a SQL or MySQL server from R and using SQL syntax against dataframes. Below a short summary of my findings and some code to gets you up and running fast. I the example code I am assuming that you have a MySQL server running at localhost.

RODBC

RODBC is a database package for general database access, so not limited to an SQL server. Ideal, if you want to be independent of the underlying database type or want to develop cross platform. Be aware however that adding an odbc connection on unix is not so easy as on windows.

library(RODBC)
db <- odbcConnect("data_source_name", uid="user", pwd="***")
# print names of the available tables
sqlTables(db)
# load a_table into a data fram
df <- sqlQuery(db, "select * from airlines")
odbcClose(db)

RMySQL

Connecting to MySQL is made very easy with the RMySQL package.

library(RMySQL)
conn = dbConnect(MySQL(), user='user', password='***', dbname='flights', host='localhost')
dbListTables(conn)
dbListFields(conn, 'airlines')
rs = dbSendQuery(conn, "select * from airlines")
data = fetch(rs, n=-1)
print(data)
dbDisconnect(conn)

sqldf

Like the package name indicates, sqldf can be used to execute SQL query’s directly against dataframes.

library(sqldf)
sqldf("select * from iris limit 5")

dplyr

dplyr is a relatively new package and the next iteration of plyr. It is faster and it has a very handy chain operator (%.%, execution from left to right). Dplyr considers database tables as data frames. The advantage of using dplyr is that no SQL knowledge is needed to query!

library(dplyr)
conDplyr = src_mysql(dbname = "flights", user = "user", password = "***", host = "localhost")
myData <- conDplyr %>%
  tbl("airlines") %>%
  select(year, airline, total_delay) %>%
  filter(year==2010) %>%
  collect()

SQL Server 2016 R Services

Since SQL Server 2016 it is possible to execute R code in SQL server. Unfortunately, I haven’t used it yet, but here is a link to a short demo. http://www.r-bloggers.com/demo-r-in-sql-server-2016/

Leave a Reply

Your email address will not be published. Required fields are marked *