EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


How to create, drop, alter, insert, table in Hadoop
SAS error: Error in the LIBNAME statement. Invalid option name SUBPROTOCOL.


Sample code: very similar to netezza, you can create, drop, alter, insert, table in Hadoop.
        proc sql;
        connect to Hadoop (server='server_name'
        database=database1 user=myuser pwd=mypassword connection=global);
        execute (drop table if exists data1) by hadoop;

        execute (alter table data1 rename to data2) by hadoop;
        execute (create table data1 as select
        distinct substr(var1,1,8) as var2 from data1) by hadoop;
        disconnect from hadoop; quit;

        proc sql;
        connect to Hadoop (server='server_name'
        database=database1 user=myuser pwd=mypassword connection=global);
        execute (insert into data2
        select var1,var2,var3
        from data1) by hadoop;


A smart way to pass Netezza/Hadoop/Teradata output to SAS without creating tables in them:
        /******************************************************/
        proc sql; connect to Netezza/Hadoop/Teradata
        (server='...' database=... user=... pwd=... connection=global);
        create table sas_data as
        select * from connection to Netezza/Hadoop/Teradata
        (select var1,var2,sum(var3) as ...
        from netezza_data as a
        where ... group by .. having sum(var3) ...
        );quit;


If you use libname to define a connection to Hadoop, you might see the following error message in the log window:

    libname hadp_lib hadoop server='server' port=... schema=database1
    hive_principal='hive/...'   hdfs_principal='hdfs/...'   Subprotocol=HIVE2;
SAS error: Error in the LIBNAME statement.
Invalid option name SUBPROTOCOL.

This is usually happening in SAS 9.3. In SAS 9.4, there should be no such error. If you need to run this in SAS 9.3, then you need to specify some SAS_HADOOP_JAR_PATH environment variable variables.

When you are loading data from Hive to SAS in data step, you may have the trouble of defining character variables with length 32000 by default. In that case, you need to specify some options:

    data test_data;
    set hd_work.test_data(dbsastype=('var_name' = 'CHAR(32)'));
    run;

Related links:

Continue to : Tips to handle large datasets-I: Index and proc Datasets   SAS tutorial home
Back to: Running SAS procedures inside Netezza   Statistics tutorial home