EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


Use FSList to Read Raw data into SAS
Always remember to double check the size of file

SAS Programmer's BookShelf
         

Question: As statistician, not sure if you have ever met some situation like the following: you are reading the dataset via proc import procedure: e.g.

 proc import datafile="directory\file.dat"
     out=out1  dbms=scv
     replace; getnames=no;
     delimiter='07'x; /*use the bell as delimiter */
     


     
datarow=2; guessingrows=32767; run;
The log window shows successfully import the dataset, no errors, no warnings. However, the size of the file seems not in the right range. Say the original raw file was more than 200MB, and the one you just read in SAS is even less than 1MB, there must be something wrong?!

Most of time there should be no problem if you are reading csv or other similar simple files via proc import. But in the case of reading some other delimitered files, especially when the data is from a party that you deal with for the 1st time. You may need some extra attention to double check.

To help us get a quick view the external raw data, you can use proc fslist: The FSLIST procedure enables you to browse, within a SAS session, external files that are not SAS data sets. Because the files are displayed in an interactive window, the procedure provides a highly convenient mechanism for examining file contents. In addition, you can copy text from the FSLIST window into any window that uses the SAS Text Editor.

Proc Fslist Fileref="directory\file.dat" < options >;

data example1;
        infile "directory\file.dat" lrecl=1 recfm=f;
        length line $8000.;
        retain line "" length numbell rownum 0;
        length inchar $1.;
        input inchar $char1.;
	   if inchar='07'x then numbell+1;   /* use the bell as delimiter */
                                                                                        
  
if (inchar='0A'x and 34 <= numbell <=45 ) then flag=1; /* line feed ='0A'x */ else flag=0; if flag=1 then do; rownum+1; line=substr(line,1,length(line)-1); output; line=""; len123=0; numbell=0; flag=0; end; else do; if line="" then line=inchar; else line=cat(substr(line,1,len123),inchar); len123+1; end; lastchar=inchar; drop lastchar inchar length len123 flag; *if _n_>2000 then stop; run; data trouble; set out1; where numbell^=37; /*the num of reasonable delimiter in a record*/ run;
The traditional way to read in a raw dataset:
data example2;
        infile "directory\file.dat" lrecl=1 truncover recfm=f unbuffered;
	length line $2000.;
	retain line "" lastchar "" length 0;
	length inchar lastchar $1.;
	input inchar $;
	if (lastchar='0D'x and inchar='0A'x)  then flag=1;
   /* line feed ='0A'x   Carriage Return ='0D'x */ 
	else flag=0;
	if flag=1 then do;
		line=substr(line,1,length(line)-1);
/*		output;*/
		line="";
		len123=0;
	end;
	else do;
		if line="" then line=inchar;
		else line=cat(substr(line,1,len123),inchar);
		len123+1;
	end;
	lastchar=inchar;
	if _n_>=10000 then stop;
/*	drop lastchar inchar length len123 flag;*/
run;  

Note: When you read a file and there is a single dot/period '.' for a column with data type CHAR, SAS will converts the '.' value to a SPACE. In other words, if you are reading some email address, ip address, the "." will be missing, showing up as empty space. To solve this issue, getting the dot/period "." back, you need to use different input statement:
input inchar $char1. instead of input inchar $1.


Related links:

Continue to next: SAS error: numerical value out of range   SAS tutorial home
Back to: SAS ERROR: Too many fields defined   Statistics tutorial home