EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer

Tips on Handling big data in R
Clear Clean Memeory in R

Some useful function in R:
str() gives a very brief description of the data.
names() gives the name of each variables
dim() gives the number of records and columns/variables of data.
rm() to remove the data your don't need anymore. summary() gives some very basic summary statistics for each variable

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.prettysize <- napply(names, function(x) {
                           capture.output(print(object.size(x), units = "auto")) })
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
    names(out) <- c("Type", "Size", "PrettySize", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)

# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)

The results in something like the following(showing the top 10 big ones):
                      Type   Size PrettySize Rows Columns
abc.def                 PCA 790128   771.6 Kb    7      NA
gh               data.frame 271040   264.7 Kb  669      50
factor.ijk   factanal  12888    12.6 Kb   12      NA
dates            data.frame   9016     8.8 Kb  669       2
sd.                 numeric   3808     3.7 Kb   51      NA
napply             function   2256     2.2 Kb   NA      NA
lsos               function   1944     1.9 Kb   NA      NA
load               loadings   1768     1.7 Kb   12       2
ind.sup             integer    448  448 bytes  102      NA
x                 character     96   96 bytes    1      NA
To remove the datasets that you don't need anymore in the R workspace, you can use the rm() function:

rm(list = ls()[grep("^tmp", ls())])
or rm(list=ls(pattern="^tmp"))
removing all objects whose name begins with the string "tmp".

Some highly-recommended data visulization books

Thanks to the helpful input from statmethods.net and www.cyclismo.org.

Some highly-recommended R Tutorial books

Related links:
Continue to R Coding Tutorial: Coding Tips-1: gsub   SAS Interview

Back to Financial Data Visualization in R: QuantMod   Analytics Home