EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


Difference between Missing value and " " value

Question: In SAS, what's the difference between missing value and " " value? In other words, If we need to grab all the missing records in SAS based on a variable say, keyvar, what's the safest way to query?

Answer: Note that a missing character value contains both a blank ('') or null value; a missing numeric value is represented as a single dot (.). The safest way to grab all the keyvar missing records is to use the following code, especially if the missing values are respresented by any characters/values other than " " or . :
        if missing(keyvar) then ... /* output both '' value and null value*/

        or if keyvar is missing then ...


The previous two ways seem safer than the simple code:

... keyvar=''; ; ... /*only output the value '', but not null value*/

Note: missing values are considered smaller than all numbers, so if you have some code e.g. "keyvar < 50", it will include the cases of missing values. Since missing values are less than any numbers.

Note that the result of any operation on missing values will return a missing value. e.g. the following returns 0:

total=q1+q2+q3+q4+q5+q6;

An alternative is to use:
total=sum(of q1-q6);
in which missing values are assumed to be zero.

Even if you think that a variable should not contain any missing values, you should always write your code under the assumption that there may be missing values. Acknowledgement to the source: pauldickman.


Special caring in Netezza: null value & empty value
In the following example from Netezza or other SQL database, you can see the same data output by using almost different conditions: value is null and value <>'1'. The key difference: value <>'1' will only output the record with '' value, but not null value.


        data example1; input variable_group $50.;
        datalines;
        Variable with non missing values
        Variable with null value
        Variable with empty value
        ; run;

        proc sql; create table Netezza_lib.test1(distribute_on = 'random')
        as select * from work.example1; quit;

        proc sql stimer; connect to netezza (server='server_name'
        database=database1 user=myuser pwd=mypasswordconnection=global);
        execute (create table test2 as select a.*, case
        when variable_group='Variable with non missing values' then '1'
        when variable_group='Variable with empty value' then ''
        end as value
        from test1 as a distribute on random; ) by netezza; quit;

        proc sql stimer; connect to netezza (server='server_name'
        database=database1 user=myuser pwd=mypasswordconnection=global);
        execute (create table test_null as
        select 'value is null' as filter_condition,a.*
        from test2 as a where value is null
        distribute on random; ) by netezza;

        execute (create table test_not_1 as
        select 'value not equal 1' as filter_condition,a.*
        from test2 as a where value<>'1'
        distribute on random; ) by netezza;

        execute (create table test_empty as
        select a.* from test2 as a where value=''
        distribute on random; ) by netezza; quit;


In netezza, say, there is one record with two character variables: var1 and var2, with var1='some value', var2 is null, then if you run a query based on the condition: where var1<>var2, you actually can't get that record.

Related links:

Continue to : Powerful Notsorted Option in By Statement   How to prepare for SAS interview?   SAS tutorial home
Back to: How can I easily create lag and lead variables?   Statistics tutorial home