Sunday, July 8, 2012

What is Bias?


Definition of Bias
Due to the fault in designing or wrong execution of the sampling process, systematic error occurs and this causes deviation in the estimation of population parameters. This systematic error is termed as Bias. The difference between the parameter’s true value and the value expected by the estimator determines the bias.

Bias versus Sampling Error
To get a statistics of anything based on the population, sample of it is taken and analyzed. When compared to the population size, if the sample size is too small to get the correct result, then it leads to sampling error.  To avoid sampling errors and to get correct answer, the sample size has to be increased. For example, performance of 68 students in English paper in their recent exam has to be evaluated. To do the analysis, if we take only 2 students as sample and both of them have secured above 95 marks then the performance report will say that all 68 have secured 95 marks. This is incorrect and this error is termed as sampling error. In this case the number of students taken for sampling is not sufficient, hence more students have to be taken for sampling – say around 15 to 20 students.

Bias define that it is different from sampling errors in the way that it does not have anything to do with the sample size but it occurs due to the fault in designing the sample or due to the fault occurring during real time processing of the sample or both.  Though bias and sampling errors are different, both lead to inaccuracy and deviation in the estimation of the population parameter. So both bias and the sampling errors together constitute the total error in the estimation of the parameter.

Types of Bias
Based on the factors causing the bias, the bias is classified into different types. Measurement bias, selection bias, exclusion bias, reporting bias and detection bias are few among the common bias types.

Let us discuss about one type of bias called the spectrum bias. The Spectrum Bias is a type of selection bias. In this, various diagnostic tests are performed on patient samples which are biased. This leads to the over estimation of the two ratios of the test - specificity and sensitivity.

Sensitivity is determined by dividing number of positive diagnostic results by number of sick people undergoing diagnostic results. Specificity is determined by dividing number of negative diagnostic results by number of sick people undergoing diagnostic results.

For example, if 27 people were undergoing diagnostic tests of which 18 proved positive, then sensitivity will be 18 divided by 27 and specificity will be 9 divided by 27.

No comments:

Post a Comment