EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Tips on Handling big data in R
Clear Clean Memeory in R

Some useful function in R:
str() gives a very brief description of the data.
names() gives the name of each variables
dim() gives the number of records and columns/variables of data.
rm() to remove the data your don't need anymore. summary() gives some very basic summary statistics for each variable

```# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
napply <- function(names, fn) sapply(names, function(x)
fn(get(x, pos = pos)))
names <- ls(pos = pos, pattern = pattern)
obj.class <- napply(names, function(x) as.character(class(x))[1])
obj.mode <- napply(names, mode)
obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
obj.prettysize <- napply(names, function(x) {
capture.output(print(object.size(x), units = "auto")) })
obj.size <- napply(names, object.size)
obj.dim <- t(napply(names, function(x)
as.numeric(dim(x))[1:2]))
vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
obj.dim[vec, 1] <- napply(names, length)[vec]
out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
names(out) <- c("Type", "Size", "PrettySize", "Rows", "Columns")
if (!missing(order.by))
out <- out[order(out[[order.by]], decreasing=decreasing), ]
out
}

# shorthand
lsos <- function(..., n=10) {
}

lsos()```
The results in something like the following(showing the top 10 big ones):
```                      Type   Size PrettySize Rows Columns
abc.def                 PCA 790128   771.6 Kb    7      NA
gh               data.frame 271040   264.7 Kb  669      50
factor.ijk   factanal  12888    12.6 Kb   12      NA
dates            data.frame   9016     8.8 Kb  669       2
sd.                 numeric   3808     3.7 Kb   51      NA
napply             function   2256     2.2 Kb   NA      NA
lsos               function   1944     1.9 Kb   NA      NA
ind.sup             integer    448  448 bytes  102      NA
x                 character     96   96 bytes    1      NA
```
To remove the datasets that you don't need anymore in the R workspace, you can use the rm() function:

rm(list = ls()[grep("^tmp", ls())])
or rm(list=ls(pattern="^tmp"))
removing all objects whose name begins with the string "tmp".

Some highly-recommended data visulization books

Thanks to the helpful input from statmethods.net and www.cyclismo.org.

Some highly-recommended R Tutorial books