EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


How to read large csv file
Error: Number of names found is less than number of variables found

Question: How do we read a large csv file into SAS, with each record/row length more than 32,767 characters?

Answer: The fundamental problem is that SAS will not read a line longer than 32,767 characters; you may get some log error message like:

Number of names found is less than number of variables found

There are several ways to get around this issue:

Method I: If you have installed Enterprise Guide, then you can use the import wizard to import the large csv file. It may take 1-2 minutes, but you are able to load the file completely without error.

Method II: If you have the access to the large csv file, try to separate the big csv file into several small csv files, with each of those small files with length less than 32,767 characters. In other words, you are separate a large number of columns into several parts, each part contains small amount of columns in each csv file.

In this case, you will need to create unique record id for each row(if you don't have one the original data); after you read the separate csv files into SAS, you need to combine them by the key varaibles, like record Id.

Method III: Use the options RECFM and UNBUF

This way is not recommended since it evolves a lot of coding, don't use this way unless the first two methods are not working for you.

RECFM=specifies the record format. Values for the RECFM= option are

D default format (same as variable).

F fixed format. That is, each record has the same length. Do not use RECFM=F for external files that contain carriage-control characters.

N binary format. The file consists of a stream of bytes with no record boundaries.

P print format. On output, SAS writes carriage-control characters.

V variable format. Each record ends with a newline character.

S370V variable S370 record format (V).

S370VB variable block S370 record format (VB).

S370VBS variable block with spanned records S370 record format (VBS). If you specify RECFM=S3270VBS, then you should specify BLDSIZE=32760 and LRECL=32760 in order to avoid errors with records longer than 255 characters.

The RECFM= option is used for both input and output.

UNBUF: tells SAS not to perform buffered writes to the file on any subsequent FILE statement. This option applies especially when you are reading from or writing to a data collection device. As explained in SAS Language Reference: Dictionary, it also prevents buffered reads on INFILE statements.

Interaction: When you use UNBUFFERED, SAS never sets the END= variable to 1.

Tip: When you read instream data with a DATALINES statement, UNBUFFERED is in effect.

Related links:

Continue to next: How to Fix sas error: "Catalog SASHelp.Dmine does not exist"   SAS tutorial home
Back to: SAS Error: Class not registered,DBMS type EXCEL (ACCESS) not valid   Statistics tutorial home