This weight is useful to estimate the population totals based on the sample.

**E****xample-1****:**In a district of 10,000 households (N=10,000), 500 households are selected to study the out-of-school children aged 6 to 14 (n=500). The survey reveals that there are 850 children aged 6-14 in the sample households, and 34 of them are out-of-school.

These numbers, however, are not much use for the district authorities, policy makers and education planners.

The sampling fraction is: n / N = 500 / 10,000 = 0.05 (or 5.0%), and the reverse of sampling fraction, is 1 / 0.05 = 20. If simple random sampling is applied, the sampling fraction is the probability of a household to be selected in the sample (*p=0.05*) and its reverse '20' is the 'weight' for a sample household (that is, a sample household represents 20 households in the district).

Therefore, total number of children aged 6-14 in the district can be estimated as:

850 x 20 = 17,000, and

the estimated number of out-of-school children in the district is: 34 x 20 = 680.

These estimates of the population, 17,000 children and 680 out-of-school children in the district are more relevant and useful than knowing there were 850 children and 34 out-of-school children in the sample. This the first reason for using the sample weight in survey data analysis.

From the above figures in the sample, the percentage of out-of-school children in the district is: 34 / 850 x 100 = 4.0%, if calculated based on the sample households.

The same result will be obtained if it is calculated from the district estimates, that is: 680 / 17,000 x 100 = 4.0%.

Therefore, no weighting is necessary when a percentage (rate, ratio or proportion) is calculated from the sample if simple random sampling, or a sampling method in which each and every household in the study area has equal possibility (or probability) of being selected, is used. Such a sampling method is known as a 'self-weighting' sampling method.

In some situations, however, there is an unequal probability that each household in the study area will be selected in the sample. In these situations, weighting is essential. The following example illustrates such situation.

**E****xample-2****:**The above district contains two types of households: stable/settled households and moving/migrated households. Of 10,000 households (N) in the district, 9,000 are regular households and the remaining 1,000 are moving ones (N_{1} = 9,000 and N_{2} = 1,000).

One of the objectives of the study is to explore the schooling status of the children aged 6-14 among the moving households, and thus, 300 regular households (n_{1} = 300) and 200 moving households (n_{2} = 200) are selected in the sample. The same results are obtained as in the Example-1, 850 children aged 6-14 and 34 of them are out-of-school in the 500 sample households.

When reviewing the results of the survey for regular and moving household groups, however, it is found that 500 children aged 6-14 are from the regular households and 350 are from the moving households. The survey also found that just 6 out-of-school children are from regular households while the remaining 28 are from the moving households.

The simple calculation of the percentage of out-of-school children in the district is: 34 / 500 x 100 = 4.0%.

However, this is not representative of the district population, since the sampling method is not 'self-weighting' (that is, every household in the district has the same chance / probability to be in the sample).

The probability of selecting a household in the sample for the regular household group is: 300 / 9,000 = 0.03333;

while the probability of selecting a household for the moving household group is: 200 / 1,000 = 0.20.

That is, one sample household represents 30 households (1 / 0.0333) in the regular group, but only 5 (1 / 0.20) in the moving household group. In other words, the sample 'weight' to be used for the regular group is 30, and just 5 for the moving group in estimating the totals.

By using the weights, there would be: 500 x 30 + 350 x 5 = 15,000 + 1,750 = 16,750 children aged 6-14 and 6 x 30 + 28 x 5 = 180 + 140 = 320 out-of-school children.

Then, the estimated percentage of out-of-school children aged 6-14 for the district is: 320 / 16,750 x 100 = 1.91%.

This 1.91% is closer (or more appropriate) estimate of percentage of out-of-school children aged 6-14 in the entire district.

From the above examples, it is obvious that the sampling method applied is very important in deciding whether the 'weighting' should be used or not in analysing sample survey data sets. In national surveys, weighting is necessary even if a self-weighting sampling method is applied because the 'response rates' vary among the different population groups or secondary sampling units (thus, the representations are different). As such, sample weights are necessary for analysing all common household survey data sets.

Back to 7. Mahalanobis_distance-2 SAS Tutorial Home