EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

SAS Interview Questions and Answers --1

Foreword: What a nickname for people using SAS: "semicolon" professoinals!

If you know how to respond to various kinds of technical questions that an interviewer might ask you, then you will be much more confident for the interview, more chance to get the position.

Question: what are the differences between WHERE & IF statement?
Answer: The major difference can be summarized as:

IF can only be used in a DATA step
Many IF statements can be used in one DATA step
Must read the record into the program data vector to perform selection with IF
WHERE can be used in a DATA step as well as a PROC.
A second WHERE will replace the first unless the option ALSO is used
Data subset prior to reading record into PDV with WHERE

Sometimes, it makes no difference whether you use a WHERE statement or a subsetting IF statement. Sometimes it makes a big difference. In explaining this distinction, you have the opportunity to discuss:

The distinction between data steps and proc steps
The difference between declaration (declarative) statements and executable (action) statements
The significance of the sequence of executable statements in a data step
Some of the finer points of merging SAS datasets
A few points of efficiency theory (although tests do not seem to bear the theory out in this case)
The origin of the WHERE clause in SQL (of course, bring this up only if you’re good at SQL)
WHERE operators that are not available in the IF statement or other data step statements

Question: what are the differences between a Function & a Proc?
Answer: we use MEAN function and PROC MEANS to explain the difference.

Computing a statistic with a function, such as the MEAN function, is not exactly the same as computing the same statistic with a procedure, such as the UNIVARIATE procedure. As you explain this distinction, you show that you understand:

The difference between summarizing across variables and summarizing across observations, in other words,
One will give an average across an observation (a row) and the other will give an average across many observations (a column)
The statistical concept of degrees of freedom as it relates to the difference between sample statistics and population statistics, and the way this is implemented in some SAS procedures with the VARDEF= option

REPLACE= option
Many SAS programmers never have occasion to use the REPLACE= dataset option or system option, but if you are familiar with it, then you have to be aware of:
The distinction between the input dataset and the output dataset in a step that makes changes in a set of data
The general concept of name conflicts in programming theory
Issues of programming style related to name conflicts
How the system option compares to the corresponding dataset option

A question on this topic may also give you the opportunity to mention syntax check mode and issues of debugging SAS programs.

RUN statement placement
The output of a program may be different based on whether a RUN statement comes before or after a global statement such as an OPTIONS or TITLE statement. If you are aware of this issue, it shows that you have written SAS programs that have more than the simplest of objectives. At the same time, your comments on this subject can also show that you know:

The distinction between data step statements, proc step, and global statements
How SAS finds step boundaries
The importance of programming style

SUM or + Adding numbers with the SUM function provides the same result that you get with the + numeric operator. For example, SUM(8, 4, 3) provides the same result as 8 + 4 + 3. Sometimes, though, you prefer to use the SUM function, and at other times, the + operator. As you explain this distinction, you can show that you understand:

Missing values
Propagation of missing values
Treatment of missing values in statistical calculations in SAS
Why it matters to handle missing values correctly in analytic processing
The use of 0 as an argument in the SUM function to ensure that the result is not a missing value
The performance differences between functions and operators
Essential ideas of data cleaning