** Top SAS Tuninig Techniques for Large Dataset **

Efficient Techniques and Tips in Handling Large Datasets
** Question: **When we work on millions of records, with hundreds of variables, it is crucial how we are processing our data.
In this tutorial, we provide a few efficient techniques and tips in handling large datasets, one of techniques is from

the 2012 SAS Global forum paper by Dr. Lafler, and

the 2011 WUSS Coders Corner paper'.

; the copyright strictly belongs to them.

**•** **WORK ON ONLY WHAT YOU NEED**
Example: we want to sort a large dataset with 10 million of records, there are altogether 20 variables (in fact we just
need 5 variables).

Click to see more larger/clear
**•** **INDEX & WHERE > WHERE > IF** also see

this tutorial for more details
Example: we are still using the same previous dataset with 50 million records, 10 variables, and we want to find a
subset satisfying certain condition (var1='key').

To see why the where-statement is faster than the if-statement: by where-statement in data step, if the condition
(var1='key') is not satisfied, the record will not read into Program Data Vector (PDV), therefore it saves us a lot of
unnecessary reading time.

**•** **IF-Then-ELSE > IF-Then**
By using IF-THEN/ELSE statements opposed to IFTHEN
statements without the ELSE, the SAS System
stops processing the conditional logic once a condition
holds true for any observation.

**•** **WORK ON SMALL SAMPLE FIRST TO TEST THE WHOLE PROCESS**
After we have tested each process (either a data step, a procedure or some SQL query), and there are several of
those in the whole process. To test the whole process, we can simply choose a smaller subset for the testing purpose. The SAS procedure
SURVEYSELECT can help us get a well-distributed sample subset.

** Related links: **

** **
Continue to :
Damerau Levenshtein distance in SQL Netezza
SAS tutorial home
Back to:
Why we got NULL value when we concatenate two columns? Rename Data?
Statistics tutorial home