Loading various data(csv, txt, excel) into R
How to Install R packages, Perl Installation
To write it, it took three months;
to conceive it, it took three minutes;
to collect the data in it, it took all my life.
-- F. Scott Fitzgerald.
R gets more and more popular in the analytics world mainly because it's free of charge, flexibility, and strong power of analytics.
To download the R software, you might need to double-check that the version you have downloaded exactly matches the package distributed by R; for windows users,
here is the link to download: http://cran.r-project.org/bin/windows/base/
Some highly-recommended R Tutorial books
Example of Loading data in R
Here are examples to import "R_Tutorial_test.csv
", and two
excel file named "R_Tutorial_test.xls
" into R.
Notice the forward slash should be used as the path separator, which is different from windows platform, in windows, we usually back slash for the directly e.g. "c:\programs\...".
>test.csv1 <- read.table("C:/R_study/R_Tutorial_test.csv", header=T, sep=",")
>test.csv2 <- read.table("C:/R_study/R_Tutorial_test.txt", header=T, sep=",")
make model mpg weight price sales
1 Honda Accord 28 2930 4099 100
2 Honda Civic 30 3350 4749 200
3 Toyota Camry 29 2640 3799 150
4 buick century 20 3250 4816 300
5 buick electra 15 4080 7827 200
In various European locales, as the comma character serves as decimal point, the read.csv2 function should be used instead. For further detail of the read.csv and read.csv2 functions, please consult the R documentation.
> help(read.csv) # documentation
> ?read.xls     # also documentation
Then there will a help page to load csv files
Loading Excel File
The following shows how to load an Excel spreadsheet named "R_Tutorial_test.xls". As the package is not in the core R library, it has to be installed and loaded into the R workspace.
There are two steps to prepare before you using the read.xls to load the excel file.
1. Load "gdata" package: the easiest way to load the gtata package is from your R Gui running window: --> click "Packages" --> Install Packages --> select minor
--> scroll down to select "gdata" packages to install
You can also go to the gdata package download page:
, the bothering part is to install that manually by youself, which is not recommended.
2. Install Perl: If you didn't install perl, or if you don't specify the correct perl location in the read.xls function, then you may get some error message like this:
Error in findPerl(verbose = verbose) :
perl executable not found. Use perl= argument to specify the correct path.
Error in file.exists(tfn) : invalid 'file' argument
To avoid this error, first you need to download the perl file to install(choose the correct version, 32-bit for windows x84)
ActiveState perl (ActivePerl)
, it may take a few minutes to install.
If you choose the default installation, the path for perl will be "C:/Perl/bin/perl.exe". Now you can use following code to read xls file.
> library(gdata) # load the gdata package
> test <- read.xls("C:/R_study/R_Tutorial_test.xls",sheet=1,perl="C:/Perl/bin/perl.exe")
> test <- read.xls("C:/R_study/R_Tutorial_test.xlsx",sheet=1,perl="C:/Perl/bin/perl.exe")
You can find the complete specification to load xls file is the following page
read.xls(xls, sheet=1, verbose=FALSE, pattern, na.strings=c("NA","#DIV/0!"),
..., method=c("csv","tsv","tab"), perl="perl")
By default, it's getting your first tab in the excel file, even if you tab name is not "sheet1".
Change Working Directory
To get your working directly, use:
> getwd() # get the current working directory
You can select a different working directory with the setwd() function, and avoid entering the full path of the data files.
> setwd("<new path>") # set the working directory to "<new path>"
Thanks to the helpful input from www.ats.ucla.edu
In case if there are some trouble rows in the csv file, you can read in by using readLines() function, then remove those rows for read.table() function.
name <- paste('http://www.google.com/finance/getprices?i=60&p=16d&f=d,o,h,l,c,v&df=cpct&q=RAD')
name <- gsub(" ","",name)
line <- readLines(name)
if(i<- grep("M",line,ignore.case = FALSE))
if(i<- grep("D",line,ignore.case = FALSE))
if(i<- grep("E",line,ignore.case = FALSE))
a1 <- read.table(text = line, fill = TRUE, stringsAsFactors=FALSE,sep=",")
Continue to Use SQl statement in R
Back to Statistics tutorial home
Excel Analytics Tutorial Home