EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Use SQL statement in R
How to Install SQLDF packages in R

Most SAS professionals use data step in SAS at the beginning of their career. Once they switch to SQL statement, LDD (Little Dumb Doctor) is so sure that they will love it! Since R is getting more popular, we also want to use SQl statement in R.

To use SQL statement in R, first you need to install SQLDF packages: R Gui running window: --> click "Packages" --> Install Packages --> select minor
--> scroll down to select "SQLDF" packages to install
.

Some highly-recommended R Tutorial books

Example of SQL Statement in R
Here are examples to use SQL statement in R. We are using the same data as in the previous tutorial: "R_Tutorial_test.csv", or "R_Tutorial_test.txt",

>print(mydata)
make model mpg weight price sales
1 Honda Accord 28 2930 4099 100
2 Honda Civic 30 3350 4749 200
3 Toyota Camry 29 2640 3799 150
4 buick century 20 3250 4816 300
5 buick electra 15 4080 7827 200

>library(sqldf)
> qldf("select * from mydata where sales > 150 limit 10")

make model mpg weight price sales
1 Toyota Camry 29 2640 3799 150
2 buick century 20 3250 4816 300
3 buick electra 15 4080 7827 200

>minprice=sqldf("select make,min(price) as price from mydata group by make")
>print(minprice)
make price
1 Honda 4099
2 Toyota 3799
3 buick 4816

Use left/inner join in SQL statement within R, Right/Full Join not supported yet in R.

>out1=sqldf("select make,model,mpg, weight,price,sales
from mydata inner join minprice using (make,price)")
>print(out1)
make model mpg weight price sales
1 Honda Accord 28 2930 4099 100
2 Toyota Camry 29 2640 3799 150
3 buick century 20 3250 4816 300

Insert Variables; Use Macro variables in SQL Statement in R

> limit <- 2
> minprice <-1000
> modelyear <- '2012'
> out2=fn\$sqldf("select make, model, mpg, weight, price, sales, \$modelyear as modelyear from out1 where price > \$minprice limit \$limit") >print(out2)
make model mpg weight price sales modelyear
1 Honda Accord 28 2930 4099 100 2012
2 Toyota Camry 29 2640 3799 150 2012

>sales=sqldf("select make,model,avg(price) as avg_price,(100*sales/sum(sales))+'%' as sales_share_within_make from mydata group by make")
>print(sales)
make model avg_price sales_share_within_make
1 Honda Civic 4424 66
2 Toyota Camry 3799 100
3 buick electra 6322 40

Some highly-recommended data visulization books