EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


Top SAS Tuninig Techniques for Large Dataset
Efficient Techniques and Tips in Handling Large Datasets

Question: When we work on millions of records, with hundreds of variables, it is crucial how we are processing our data. In this tutorial, we provide a few efficient techniques and tips in handling large datasets, one of techniques is from the 2012 SAS Global forum paper by Dr. Lafler, and the 2011 WUSS Coders Corner paper'.
; the copyright strictly belongs to them.

WORK ON ONLY WHAT YOU NEED
Example: we want to sort a large dataset with 10 million of records, there are altogether 20 variables (in fact we just need 5 variables). Click to see more larger/clear

WORK ON ONLY WHAT YOU NEED


INDEX & WHERE > WHERE > IF   also see this tutorial for more details
Example: we are still using the same previous dataset with 50 million records, 10 variables, and we want to find a subset satisfying certain condition (var1='key').
  INDEX & WHERE > WHERE > IF    
To see why the where-statement is faster than the if-statement: by where-statement in data step, if the condition (var1='key') is not satisfied, the record will not read into Program Data Vector (PDV), therefore it saves us a lot of unnecessary reading time.

IF-Then-ELSE > IF-Then
By using IF-THEN/ELSE statements opposed to IFTHEN statements without the ELSE, the SAS System stops processing the conditional logic once a condition holds true for any observation.
IF-Then-ELSE > IF-Then

WORK ON SMALL SAMPLE FIRST TO TEST THE WHOLE PROCESS
After we have tested each process (either a data step, a procedure or some SQL query), and there are several of those in the whole process. To test the whole process, we can simply choose a smaller subset for the testing purpose. The SAS procedure SURVEYSELECT can help us get a well-distributed sample subset.

WORK ON SMALL SAMPLE FIRST TO TEST THE WHOLE PROCESS

Related links:

Continue to : Damerau Levenshtein distance in SQL Netezza   SAS tutorial home
Back to: Why we got NULL value when we concatenate two columns? Rename Data?   Statistics tutorial home