EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

SAS Netezza Tutorial: Special Caring
Is sorting data available in Netezza SQL?
How do we use group/rank statement in Netezza SQL?

Question 1: Can we sort a dataset inside Netezza? How come the first 5 observations in Netezza table is always different?

Answer: Yes, but be careful! Because the output seems not sorted.
Here are a few examples to show more details.

Example 1: If you run the following code, we might expect the same data sets output, right?

        /******************************************************/
        proc sql; connect to netezza(server='server_name'
        database=database1 user=myuser pwd=mypassword connection=global);

        Data test1;   Set NZ.netezza_table;   Run;
        Data test2;   Set NZ.netezza_table;   Run;

The SAS tables test1 and test2 will have different data order.

Example 2: If you run the following code multiple times to get the first 5 obs.:

        /******************************************************/
        proc print data= NZ.netezza_table (obs=5);   run;
        /******************************************************/


Every time you will see different outputs.

Example 3: Sort the data inside netezza by using "order by" is a little bit tricky,
        /******************************************************/
        proc sql; connect to netezza(server='server_name'
        database=database1 user=myuser pwd=mypassword connection=global);
        execute (create table outputfile as
        select var1, var2, var3
        from another_netezza_data
        order by var1,var3
        distribute on random) by netezza; quit;
        /* notice there is no error in log */
        /******************************************************/


If you run the previous code, there will no errors in the log, which means it is sorted, but if you click the output data to view, you will see different records at different time you click the data. To see the sorted data, you need to output to sas, here is the trick (thanks to our colleague Shelly):

        /******************************************************/
        data dataout;
        set NZ.outputfile;
        by var1, var3; run;
        /******************************************************/


Question 2: Group Statement doesn't work well in Netezza, what's the alternative?
Answer: In SAS, we can use the following group statement, it's working well without any issues.
        /******************************************************/
        proc sql;
        create table table_new as
        select var1,var2,var1/sum(var1) as share from table_old
        group by var2; quit; (Notice no grouping by var1)

The above grouping is working well in SAS. However, for the following code:

        proc sql;   connect to netezza (server='server_name' database=database1
        user=myuser pwd=mypassword connection=global);
        execute (create table table_new
        as select var1, var2,var1/sum(var1) as share
        from table_old
        group by var2
        distribute on random) by netezza; quit;


If you run the above grouping in netezza, there will be some
ERROR: Attribute var1 must be GROUPed or used in an aggregate function.
To help resolve the problem, you can use partition statement in Netezza:


        proc sql;   connect to netezza (server='server_name' database=database1
        user=myuser pwd=mypassword connection=global);
        execute (create table table_new
        as select * from ( select var1, var2,
        var1/ ( sum(var1) over (partition by var2)) as share,
        var1/ ( sum(var1) over (partition by null)) as share, /*sum all */
        rank() over (partition by var1 order by var1 desc) as ranking
        Rank function might have output like: 1, 2,2,2, 5, ...
        dense_rank() over (partition by var1 order by var1 desc) as ranking
        Dense_Rank function have distinct output like: 1, 2,2,2, 3, ...
        from table_old ) subsel where ranking<11 )
        distribute on random) by netezza; quit; *grab top 10 obs in each var1 group;
        /******************************************************/


Related links:
Continue to next: How to distribute data in Netezza Server   SAS tutorial home
Back to: Tips for large datasets-IV: How to update temporary data in Netezza?   Statistics tutorial home