Top SAS Tuninig Techniques for Large Dataset
Efficient Techniques and Tips in Handling Large Datasets
When we work on millions of records, with hundreds of variables, it is crucial how we are processing our data.
In this tutorial, we provide a few efficient techniques and tips in handling large datasets, one of techniques is from
the 2012 SAS Global forum paper by Dr. Lafler
the 2011 WUSS Coders Corner paper'.
; the copyright strictly belongs to them.
• WORK ON ONLY WHAT YOU NEED
Example: we want to sort a large dataset with 10 million of records, there are altogether 20 variables (in fact we just
need 5 variables).
Click to see more larger/clear
• INDEX & WHERE > WHERE > IF
this tutorial for more details
Example: we are still using the same previous dataset with 50 million records, 10 variables, and we want to find a
subset satisfying certain condition (var1='key').
To see why the where-statement is faster than the if-statement: by where-statement in data step, if the condition
(var1='key') is not satisfied, the record will not read into Program Data Vector (PDV), therefore it saves us a lot of
unnecessary reading time.
• IF-Then-ELSE > IF-Then
By using IF-THEN/ELSE statements opposed to IFTHEN
statements without the ELSE, the SAS System
stops processing the conditional logic once a condition
holds true for any observation.
• WORK ON SMALL SAMPLE FIRST TO TEST THE WHOLE PROCESS
After we have tested each process (either a data step, a procedure or some SQL query), and there are several of
those in the whole process. To test the whole process, we can simply choose a smaller subset for the testing purpose. The SAS procedure
SURVEYSELECT can help us get a well-distributed sample subset.
Continue to :
Damerau Levenshtein distance in SQL Netezza
SAS tutorial home
Why we got NULL value when we concatenate two columns? Rename Data?
Statistics tutorial home