EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


How can I easily create lag and lead variables?

Question: How can create lag variable by lag function? Is there any lead function?

Answer: We introduce two methods to create lag and lead variables.
newvar=lag1(oldvar) moves the variable oldvar downward by one record;
newcar=lag2(oldvar) moves the variable oldvar downward by two records.

    /***************************************************/
        data example0;
        input x @@;   *@@ to read multiple records on the same line;
        datalines;
        1 2 3 4 5
        ;        /*not to put ";" in the previous line, not working*/
        run;
    /***************************************************/
        data example1; set example0;
        lag_x=lag(x); lag2_x=lag2(x); run;
    /***************************************************/

There is no lead function, no code like: lead_x=lead(x) to get the lead variables.

Instead, we can use the following two tricks to get the "lead".

sas_tutorial_lead_lag_variables

Method (I): merge two data
    /***************************************************/
        data example2;
        merge example1
                  example0(firstobs=2 keep=x rename=(x=lead_x));
        run; /*no need to use by statement here*/
    /***************************************************/

Method (II): Proc Expand with Convert statement
We can use proc expand with convert statement to get lead variables, noticeit only works with numerical variables.
    /***************************************************/
        proc expand data=example0 out=example3 method = none;
        convert x= lag1_x / transformout=(lag 1);
        convert x = lead1_x / transformout=(lead 1);
        by company; run;
    /***************************************************/

Method (III): In case of Netezza or other SQL Database, we can use the following SQl to get lead, lag values, and the rolling/moving average(backward # rows and fowward ## rows):
    /***************************************************/
        proc sql stimer; connect to netezza(server='server_name'
        database=database1 user=**** pwd=***** connection=global);
execute (create table example3 as select a.*,
lag(value,1) over (partition by company order by company,dept) as lag_val,
lag(value,2) over (partition by company order by company,dept) as lag_2_value,
lead(value,1) over (partition by company order by company,dept) as lead_val
from example0 as a distribute on random;

execute (create table example4 as select company, dept, value,
avg(value) over (partition by company order by company,dept,
rows between 2 preceding and 2 following )
as Rolling_Avg_within_company,
from example0 as a distribute on random;

) by netezza; quit;


Related links:

Continue to next: Difference between missing values and null values   SAS tutorial home
Back to: Mystery of Proc Sort by Nodup and Nodupkey   Statistics tutorial home