In Statistical data analytics, we often need to deal with cases of missing data:

-- What should be done about missing data?

-- What is the risk of doing nothing(simply delete/ignore)?

-- what type of bias of introduced by imputation?

Before we do any imputation, we recommend trying some exploratory analysis for the data first. For instance, check the records of missing values, can we find out some pattern that the missingness is predictable from other variables in the dataset. If so, then all becomes simple. Or the missingness of some variable happens at the same time as some other variables. Here are highly recommended books on missing data:

If we don't handle the missing data carefully,we may have those consequences:

-- Generate bias due to difference between those with full data and missing data.

-- Missing those unbalanced data patterns caused by incomplete data.

-- Limit our choices for data analysis/management.

variables in the order: var1, var2, var3,... then there will be more and more missing on the right side; the data looks like an upper triangle. If var2 is missing at some rows, then var3 will also be missing in those rows. In general, a variable Yj is missing for a particular individual implies that all subsequent variables Yk, k>j, are missing for that individual(click to see larger picutre).

-- For Interval variables, replace missing values with mean of the non-missing ones.

-- For Categorical ones, replace missing values with the most frequent category.

proc mi data=data out=mi_out seed=12345 nimpute=5; var var1 var2 var3; /* variables' order mattters*/ |
proc reg data=mi_out outest=reg1; model var3=var1 var2; by _imputation_; |
proc mianalyze data=reg1; modeleffects intercept var1 var2; /*overall one simple output */ |

For example, consider a trivariate data set with variables Y1 and Y2 fully observed, and a variable Y3 that has missing values. MAR assumes that the probability that Y3 is missing for an individual can be related to the individual's values of variables Y1 and Y2 , but not to its value of Y3. On the other hand, if a complete case and an incomplete case for Y3 with exactly the same values for variables Y1 and Y2 have systematically different values, then there exists a response bias for Y3, and MAR is violated.

For more details and help you better understand, we recommend you try those examples by yourself, you can read Multiple Imputation for Missing Data,

and this paper: Multiple Imputation for Missing Data to discuss MI and Mianalyze procesures in details.