EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Powerful Notsorted Option in By Statement

Question: Sometimes we want to do the summary based on "by variable", and "by varaible" is not sorted in the dataset. As you can see in the following graph, there are 3 states' level in the data, but we want to add the totals in 4 groups. (you can download the SAS paper for more details:BY'S NOTSORTED OPTION)

Answer:
/***************************************************/
proc sort data=DS1 out=DS1s;
by STATE; run;

/* Not the correct one to get the output */
data SUMDS1 (drop=AMT);
set DS1s;
by STATE;
if first.STATE then TOTAMT = 0;
TOTAMT+AMT;
if last.STATE then output; run;

/* Here is the correct one to get the output */
data SUMDS2 (drop=AMT);
set DS1;
by STATE NOTSORTED;
if first.STATE then TOTAMT = 0;
TOTAMT+AMT;
if last.STATE then output; run;

Also we can apply notsorted in many other procedures, for instance, we can apply notsorted option in proc means procedure.

proc means data=DS1 sum maxdec=0 noprint;
by STATE NOTSORTED;

var AMT; output out=sumds2(drop=_type_ _freq_ )
sum(amt)=TOTAMT ; run;

Proc means runs significantly faster for large numbers of groups if the data is first sorted with proc sort and then a BY statement with the classification variables instead of a CLASS statement. The CLASS statement informs the means procedure to build a table of all possible combinations of class variables. The BY statement allows proc means to process a single group, then write it to the output data set, and access the same storage to process the next group. The BY statement needs either a sorted data set or a grouped data set with BY's NOTSORTED option.

Only one BY statement can be used in each PROC step. A procedure creates output for each BY group. If rows with the same values for the BY variables are not contiguous, a procedure with BYNOTSORTED processing treats each contiguous set as a separate BY group. The statistics procedures perform separate analyses for each BY group and the reporting procedures produce a report for each BY group.

Here are a few representative procedures supporting the BY statement with the NOTSORTED option: btl, calendar, chart, compare, corr, forms, freq, glm, means, mi, nested, plot, print, rank, report, score, standard, summary, tabulate, timeplot, transpose, and univariate.

Proc transpose does not transpose BY groups. Instead, for each BY group, proc transpose creates a row for each variable that it transposes. If the BY statement's NOTSORTED option is not specified, the rows must be sorted by all the BY variables or all the BY variables must be indexed appropriately.

Related links:
Continue to: Very Tricky when comparing two numerical numbers in SAS   SAS tutorial home
Back to Difference between missing values and null values   Statistics Tutorial Data Mining