EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

SAS Netezza Tutorial: Special Caring
Why we got NULL value when we concatenate two columns?
How do we Rename a database/dataset?
How do we delete/update some records in netezza dataset?
Question 1: How come we got empty/null value in the SQL concatenation results?

Answer: In SQL (Netezza), the concatenation connector "||" sometimes can get empty(null) values.
        /******************************************************/
        libname NZ netezza
        server=nps1 database=test user=myuser pwd=mypwd connection=global;
        execute (create table outputfile as
        select visitor,var1, var2, var1||'_'||var2 as var1_var2
        from datain
        distribute on random) by netezza; quit;


As you can see from the following example, you might get empty values after concatenate two variables var1 and var2.
  Click to see larger picture
concatenation results empty/NULL in SQL
 
The reason is due to the empty(null) value in either var1 or var2. As long as one of the columns is NULL, the concatenated results will be NULL. We can use nvl() function to get rid of this issue.

        libname NZ netezza
        server=nps1 database=test user=myuser pwd=mypwd connection=global;
        execute (create table outputfile as
        select visitor,var1, var2, nvl(var1,'')|'_'||nvl(var2,'')| as var1_var2

        from datain
        distribute on random) by netezza; quit;


Question 2: How do we rename a netezza database/dataset? How do we create a netezza database? How do we delete/update some records in netezza dataset?

Answer: In SAS, we can proc datasets to rename a SAS dataset.
        proc datasets library=work;
        change old_dataset1_name=new_dataset1_name
        old_dataset2_name=new_dataset2_name; run;

In Netezza, we have some SQL like rename code:

        proc sql;   connect to netezza (server='server_name' database=database1
        user=myuser pwd=mypassword connection=global);

        execute (ALTER TABLE old_name RENAME TO new_name )
        execute (ALTER TABLE table_name drop (column1,column2) cascade
        execute (ALTER database old_db_name RENAME TO new_db_name )

        execute ( DELETE FROM tablename WHERE var1=...)
        execute (alter table table1 rename column y to customers; )
        execute (alter table table1 add column column1 varchar(40); )
        execute (groom table table1 versions; ) /*need for add/delete columns */
        /*Drop a few variables from netezza dataset*/
        execute (alter table tablename drop (var1,var2,var3 ) cascade )
        execute (create database database_name)
        execute (drop database database_name)
        execute (drop table outputfile1, outputfile2
        ) by netezza;quit; /*will get error message if no such outputfile1*/


When we are using "Alter Table ..." statement to update table, especially for adding/deleting columns, you might see some wired "123" symbol from the IBM netezza administrator tool; furthermore, this might make your server down if you are trying to insert records into that updated table. Here is why: User need to be cautious when using this functionality as it can cause performance issues. Alter table SQL basically create a new table called "Version tables" for the new column added.

Consider a table t1 (col1, col2). User decided to add another column (col3) to this table. After doing the alter table, system will have main table (with column col1 and col2) and a versioned table ( with column col3 ). This internal table versions of tables will be transparent to users. However when we query the table T1, system will do a UNION ALL for the main table and version table internally which will cause performance decrease. Higher the number of versioned tables, higher the performance will hit for query against that table.

GROOM operation is highly recommended as soon as "Alter table" is done for a table, it will materialize all the versioned tables in one main table and performance should get back to normal.

Or we can use common proc sql to delete tables in Netezza:
        proc sql; drop table netezza_lib.outputfile1;
        quit; /*will get warning message if no such outputfile1*/

Question 3: Sometimes you might see some strange results from the calculation on Netezza:
        proc sql; create table test as
        select type, portion,total,portion/total as pcnt format percent8.1
        from Netezza_lib.test ; quit;

You might always get pcnt=0, this is most likely due to the round issue. If both variables portion,total are integer values in Netezza, then the calculation output based on them will also be integer value, which is rounded to 0. A tricky tip to fix this is: round(portion,1)/total as pcnt format percent8.1.

Related links:
Continue to next: Top SAS Tuninig Techniques for Large Dataset   SAS tutorial home
Back to: How to distribute data in Netezza Server   Statistics tutorial home