EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Data Manipulation in R
Data Transpose & Sorting Data in R

Some highly-recommended data visulization books
       

Analytics professionals usually use PROC transpose for data manipulation in SAS, so how can we do this in R?

We are using the following R self-included datasets mtcars and Indometh to
demontrate. The data mtcars was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

1) To understand what's inside the dataset mtcars:
>? mtcars
Sample output:

            mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
...


2) sort the data by mpg(ascending) and cyl(descending):

> car1 <- mtcars[order(mpg,-cyl),]
> car1
            mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 6 460.0 215 3.00 5.424 17.82 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
...


3) Transpose the data into horizontal layout, one subject with one record. In R terminology, that is in "wide" format. (Mutiple records for one subject is in "long" format).
> patient1 <- Indometh[order(-Subject),]
> patient2 <- reshape(patient1,direction="wide",timevar="time",idvar="Subject")
> patient2


Another way to do the tranpose is to use function: xtabs:
> patient1 <- Indometh[order(-Subject),]
> patient2 <-xtabs(conc ~ Subject + time, data = patient1)
> patient2


         Subject 0.25 0.5 0.75   1   1.25   2   3   4   5   6   8
1      1 1.50 0.94 0.78 0.48 0.37 0.19 0.12 0.11 0.08 0.07 0.05
12     2 2.03 1.63 0.71 0.70 0.64 0.36 0.32 0.20 0.25 0.12 0.08
23     3 2.72 1.49 1.16 0.80 0.80 0.39 0.22 0.12 0.11 0.08 0.08
34     4 1.85 1.39 1.02 0.89 0.59 0.40 0.16 0.11 0.10 0.07 0.07
45     5 2.05 1.04 0.81 0.39 0.30 0.23 0.13 0.11 0.08 0.10 0.06
56     6 2.31 1.44 1.03 0.84 0.64 0.42 0.24 0.17 0.13 0.10 0.09

Thanks to the helpful input from statmethods.net and www.cyclismo.org.

Some highly-recommended R Tutorial books
         

Related links:
Continue to Formatting SQL query, Tips to use Group Statement   SAS Interview

Back to Use SQL statement in R via package sqldf library   Analytics Home