EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Why 0.1 + 0.1 is not equal to 0.2
Never compare two "double precision" numerical numbers directly!
Very Tricky when comparing two numerical numbers in SAS/SQL/Netezza

Question: Sometimes you may feel confused for the results when you want to compare two numerical numbers in SAS. As you can see from the following example, considering the SAS statement "if 0.3=3*0.1 then equal='Y'; else equal='N';"
Most of time we may believe the results should be: equal='Y'; But the result is actually: equal='N';

Why? NOT ALL NUMBERS CAN BE REPRESENTED EXACTLY ON THE COMPUTER!
All numerical numbers are actually stored in floating-point representation. Due to the dumb computer binary(0/1) expression, 2 is expressed as 0000001, since 2^1=2; 0.75 is expressed as 00000.11 since 0.75=2^(-1)+2^(-2). The following is another tricky example.

Another Example : download two papers for reference: Round in SAS, Why .1 + .1 Might Not Equal .2.

Even More tricky: say, you want to join two relatively large datasets(each over a few million records) by using some common numerical variable. Even in Netezza, say, if you have waited more than half an hour, but still no output, then most likely you need to revise something before you do the join.

/***************************************************/
proc sql;
create table Dataout1 as
select *
from data1 as a left join data2 as b
on a.num_var1=b.num_var2
;quit;

/***************************************************/
proc sql;
create table Dataout2 as
select *
from data1 as a left join data2 as b
on abs(a.num_var1-b.num_var2)< 0.01
;quit;

The results from those output datasets might not be the same! The more tricky part is that, the 1st SQL takes much longer than the 2nd SQL. Little Dumb Doctor used to do a similar task in Netezza SQL, it took 30 seconds for 2nd SQL, and it took more than 2 hours from the 1st SQL! More than 2 hours! I'll be very scared to tell others that I'm using netezza to do SQL and It takes more than 2 hours!

Another way to get around this time-consuming task is using integers; sometimes in SQL/Netezza database, some integers will be expressed as double precision numerical values. If we write code to compare those two double precision numerical variables directly, then we will blow up! Waiting hours for the results! Instead, you can use some function e.g. cast(num_var1 as int) as Int_var1, then compare those directly; then you can get the results in seconds.(watch out for some very big integers!)

The Rule of Thumb:
1) Use integer data type wherever possible(smallint,bigint) instead of floating point.
2) Avoiding floating point wherever possible(real, double precision).
3) Consider Char(N) instead of Varchar(N).
4) Avoid Number --> Character Conversion.