EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

SAS Netezza Tutorial: Special Caring
Is sorting data available in Netezza SQL?
How do we use group/rank statement in Netezza SQL?

Question 1: Can we sort a dataset inside Netezza? How come the first 5 observations in Netezza table is always different?

Answer: Yes, but be careful! Because the output seems not sorted.

Example 1: If you run the following code, we might expect the same data sets output, right?

/******************************************************/
proc sql; connect to netezza(server='server_name'

Data test1;   Set NZ.netezza_table;   Run;
Data test2;   Set NZ.netezza_table;   Run;

The SAS tables test1 and test2 will have different data order.

Example 2: If you run the following code multiple times to get the first 5 obs.:

/******************************************************/
proc print data= NZ.netezza_table (obs=5);   run;
/******************************************************/

Every time you will see different outputs.

Example 3: Sort the data inside netezza by using "order by" is a little bit tricky,
/******************************************************/
proc sql; connect to netezza(server='server_name'
execute (create table outputfile as
select var1, var2, var3
from another_netezza_data
order by var1,var3
distribute on random) by netezza; quit;
/* notice there is no error in log */
/******************************************************/

If you run the previous code, there will no errors in the log, which means it is sorted, but if you click the output data to view, you will see different records at different time you click the data. To see the sorted data, you need to output to sas, here is the trick (thanks to our colleague Shelly):

/******************************************************/
data dataout;
set NZ.outputfile;
by var1, var3; run;
/******************************************************/

Question 2: Group Statement doesn't work well in Netezza, what's the alternative?
Answer: In SAS, we can use the following group statement, it's working well without any issues.
/******************************************************/
proc sql;
create table table_new as
select var1,var2,var1/sum(var1) as share from table_old
group by var2; quit; (Notice no grouping by var1)

The above grouping is working well in SAS. However, for the following code:

proc sql;   connect to netezza (server='server_name' database=database1
execute (create table table_new
as select var1, var2,var1/sum(var1) as share
from table_old
group by var2
distribute on random) by netezza; quit;

If you run the above grouping in netezza, there will be some
ERROR: Attribute var1 must be GROUPed or used in an aggregate function.
To help resolve the problem, you can use partition statement in Netezza:

proc sql;   connect to netezza (server='server_name' database=database1
execute (create table table_new
as select * from ( select var1, var2,
var1/ ( sum(var1) over (partition by var2)) as share,
var1/ ( sum(var1) over (partition by null)) as share, /*sum all */
rank() over (partition by var1 order by var1 desc) as ranking
Rank function might have output like: 1, 2,2,2, 5, ...
dense_rank() over (partition by var1 order by var1 desc) as ranking
Dense_Rank function have distinct output like: 1, 2,2,2, 3, ...
from table_old ) subsel where ranking<11 )
distribute on random) by netezza; quit; *grab top 10 obs in each var1 group;
/******************************************************/