Chapter 3: Probability

3.3 You might begin with computing the median for all scores, irrespective of group, and then coding all of the scores below the median as 0 and all of the scores above the median as 1. Then create a 2 × 2 table. The columns can be labelled group 1 and group 2. The rows can be labelled 0 and 1. For score (being above average) to be independent of group, the percentage of 1s will need to be the same in the two columns (groups).

3.4 The only way not to toss at least one head after three tosses is to toss three tails. We assume that the probability of tossing a head is 0.5. Thus, the probability of tossing a tail is 0.5. The probability of tossing all three tails is 0.5(0.5)(0.5) or 0.125. If we subtract the probability of tossing three tails from 1.0, then we will have the probability of tossing at least one head.

3.5 Again, the key is to know how many possible outcomes there are on one roll of a pair of dice. You might start with rolling a 1 on the first die and a 1 on the second die (1,1), then (1,2), then (1,3), and so on through to (6,6). As we saw, there is an easier method for determining the number of such combinations: n^r. The method will not enumerate the outcomes, but it does convey the number of possible outcomes. In this case, n is the number of possible outcomes on one die (6) and r is the number of rolls of the dice (2). Thus, when you try to enumerate all possible outcomes for one roll of two dice, you will find 36 possible outcomes. The probability of any one of the outcomes is 1/36 or 0.0278. The probability of rolling one of the outcomes other than a pair of ‘6’s is 0.0278(35) or 0.973. (Because there is one way to roll two ‘6’s, 0.0278 is multiplied by 35 rather than 36.)

Assuming one roll of the dice is independent of the others, the three probabilities can be multiplied to determine the probability of not rolling at least one pair of ‘6’s after three rolls of the dice. This can be transformed into exponent form (0.0.973)³. Thus the probability of not rolling at least one pair of ‘6’s after three rolls is 0.921. To answer our question the problem needs to be turned back around. The probability of not rolling at least one pair of ‘6’s is subtracted from 1.000. Thus, the probability of rolling at least one pair of ‘6’s after three rolls of a die is (1.000 − 0.921) or 0.079.

3.6 What is the probability of being infected with Skewed-Leptokurtosis, if we find that 10% of the population has the dreaded disease, 75% of those with the disease will test positive, and if 1.8 % of those who do not have the disease will test positive?

(answer = 0.63) What is the probability of being infected with skewed leptokurtosis, if we find that 4% of the population has the dreaded disease, 90% of those with the disease will test positive, and if 1.8 % of those who do not have the disease will test positive? (Answer = 0.68)

The interactive demonstration allows the student to modify the sensitivity and specificity as well as the base rate. It quickly becomes clear that if sensitivity and specificity are held constant, the base rate greatly influences the probability of having the disorder. Lower base rates (rarer disorders) result in lower probabilities of having the disorder. Conversely, lower sensitivity (false positives) results in higher probabilities of having the disorder. Once the student is familiar enough with how the matrix behaves, a few addition problems that require the student to work backwards are posed. For example, if the probability of having the disease is 0.95 if you test positive, and if the sensitivity is 0.80, then what is the specificity? Answers can be checked by working in the typical order, from information to final probability.

3.8 Can you create two skewed distributions that will give the appearance of normality when they are combined? Use SPSS to create the two separate data sets and view the resulting histograms.

Begin by combining the data set. Next change the data sets so that both are positively or negatively skewed. Finally, reverse the direction of the skewness of one of the data sets. Then the answer is revealed when two distributions that are equally skewed, but in different directions, are combined.

3.9 For the purpose of assigning marks, the distribution that maximizes fairness is one that is symmetrically distributed and where the scores are spread from nearly minimum performance to maximum performance. The spread insures that one or two lucky or unlucky guesses on the part of a student will change his or her position relative to the other students. This is one way to describe the reliability of the scores. When the test is negatively skewed (ceiling effect), too many scores are clustered at the top. Small variations in performance can result in meaningful changes in a student’s relative position in a class. This is unfair. Some students deserving of top marks might not appear to be at the top of the distribution. When the test is positively skewed (basement effect), too many scores are clustered at the bottom of the distribution. Small accidents in performance can result in meaningful changes as well. This also is unfair. Some students deserving passing marks might accidently appear to be less knowledgeable than they are. In both cases of skewness, to the extent that the skewness reduces the reliability, the less fair is the test.

3.10 Hint: We can avoid any trial and error by beginning with a standard normal distribution. It is easy to create a distribution with a mean of 30. Begin with three scores: −1, 0, 1. The mean and the variance of these scores are 0 and 1, respectively. Using rules that we have covered earlier, we can change the variance from 1 to the desired 36 by multiplying all three scores by the square root of the desired variance. Because we desire a variance of 36, we multiply the three scores by 6: −6, 0, 6. The mean is unchanged but the variance is now 36. The variance of 36 is retained, but we may move the mean to 30 by adding 30 to all of the scores: 24, 30, 36. It is then a straightforward matter to change the distribution to one with a mean of 50 and a variance of 9. We add 20 to all three scores to create a mean of 50: 44, 50, 56. To obtain a variance of 9 we can reverse the process used to create the variance of 36. The resulting scores are 47, 50, and 53. Try the linear transformations with ns of 5 and 7.

Data Analysis for the Social Sciences: Integrating Theory and Practice

Student Resources

Chapter 3: Probability