EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Tips in handling Survey data
Remove carriage return and linefeed characters

Question: If you have analyzed the survey data, you may see there are paragraphs of comments/inputs as an input for just one variable. That is, in one record/row,

for one respondent, the "comment" contains several paragraphs as comments, each seperated by carriage returns.

If you read those csv files into SAS, you may see the output file is not the correct one, since carriage return will give you another record to start.


Tip to start: in csv file format, replace every comma ',' in the comment by the period '.' or something else; otherwise you will get stuck in reading the file.

If you have the carriage return and linefeed characters in the sas format data(.sas7bdat), then you can simply used the following code to remove them:
variable= Compress(variable,'0D0A'x);
keywords=compress(compbl(var1),'"');
The compbl function is to remove multiple blanks into a single one, and then use compress to remove the double quotes.

Example: We have created a "trouble" csv file containing those carriage return and linefeed characters, and show you how to remove them. You can actually use your own data, read your own data by proc import, then apply the code to see the effect.

        /****************************************************/
        /* Create an example .csv file containing CR/LF bytes.                 */
        /* '0D'x is the hexadecimal representation of CR(carriage return) */
        /* '0A'x is the hexadecimal representation of LF(linefeed).           */
        /****************************************************/
        data _null_;
        file "c:\sample.csv";
        put '"field1","field2","field3'
        '0D'x
        /* Put the hex values inside thedouble quotes */
        '",'
        '"field4","field5","field6'
        '0A'x
        '",'
        '"field7","field8","field9"'     ; run;

        /****************************************************/
        /* Read in the test file created above.                                        */
        /****************************************************/

        data out;
        infile "c:\sample.csv" dsd truncover;
        length var1 - var9 $15;
        input var1 - var9 $;   run;

        proc print;
        title 'File is read incorrectly due to embedded CR/LF';   run;

        /****************************************************/
        /* This program UPDATES IN PLACE, create a backup copy          */
        /* Replace carriage return and linefeed characters inside             */
        /* double quotes with a specified character. This sample             */
        /* uses '@' and '$', but any character can be used, including       */
        /* spaces. CR/LFs not in double quotes will not be replaced.        */
        /****************************************************/

        %let dsnnme="c:\sample.csv"; /* use full path of CSV file */
        /******************************************************/
        /* RECFM=N reads the file in binary format.                                  */
        /* The file consists of a stream of bytes with no record boundaries.  */
        /* SHAREBUFFERS specifies FILE and INFILE share same buffer.      */
        /******************************************************/

        %let repA='@'; /* replacement character LF */
        %let repD='$'; /* replacement character CR */

        data _null_;
        infile &dsnnme recfm=n sharebuffers;
        file &dsnnme recfm=n;

        /* OPEN is a flag variable used to determine if the CR/LF is within     */
        /* double quotes or not. Retain this value.                                       */

        retain open 0;

        input a $char1.;
        /* If the character is a double quote, set OPEN to its opposite value. */

        if a = '"' then open = ^(open);

        /* If the CR or LF is after an open double quote,                              */
        /* replace the byte with the appropriate value.                                */

        if open then do;
        if a = '0D'x then put &repD;
        else if a = '0A'x then put &repA;
        end;   run;

        /****************************************************/
        /* Read in new version of file to check for success.                      */
        /****************************************************/

        data outp;
        infile "c:\sample.csv" dsd dlm=',' truncover;
        length var1 - var9 $ 15;
        input var1 - var9 $ ;     run;

        Source: http://support.sas.com


Related links:

Continue to next: Tips to handle large datasets-I: by using Index   SAS tutorial home
Back to: How can I read external files from FTP into SAS? Symputx?   Statistics tutorial home