5. Introduction to Hypothesis Tests

Recall that statistical inference is the process of making guess about a population parameter using the information from a sample. One important type of statistical inference is the testing of a hypothesis about a population parameter, and in this chapter, we will explore some important statistical concepts related to hypothesis test. This is accomplished by considering a number of examples that use simulation to determine if there is strong evidence of a difference from the ideal situation.

In the first example, we investigate a claim about hearing loss, trying to determine if claimant is purposefully being dishonest. This example is introduced below and will be explored in a number of in-class activities.

5.1. Evaluating Studies 2

In class, we conducted some simulations to identity the effect on two forms of randomization, randomized selection of individual units and random assignment of treatments in an experiment. Based on these activities, we identified two main effects of randomization.

The effect of random selection of individuals in sampling

Using a random sampling technique to select individuals resulted in unbiased estimates of parameters. In other words, our statistics did not tend to over- or under-estimate the parameter, but balanced these errors instead. (This assumes no other source of bias.)

The effect of using random assignment of treatments in an experiment.

Using random assignment of treatments tended to balance out other factors between the groups. Therefore, other factors are not a likely cause of the differences between the groups and the most likely cause of any difference in the groups is the differences in treatments.

When evaluating a statistical study, the first thing that should be considered is the appropriate scope of the conclusions, that is to say the types of conclusions that are justified for this type of study. The following table gives some advice on what conclusions can be drawn based on

  1. Whether or not there was random sampling.
  2. Whether or not it was an experiment with random assignment of treatments.
../_images/evaluating_studies3.png

Evaluating studies cheat sheet

A study on flossing and gum disease (experiment)

A recent study was conducted to determine the effectiveness of flossing on gum disease. A group of 500 volunteers were randomly split into two groups. The first group was asked to floss once a day and the second group was asked to floss after each meal. Suppose that the group that flossed after every meal was much less likely to have gum disease as the group that flossed once per day.

    Q-63: Did this study use random assignment of treatments?
  • (A) Yes
  • (B) No
  • Note that the text mentions that the participants were randomly split into groups.
    Q-64: Based on the fact that there was a large difference between the groups, is it safe to say that the difference in flossing frequency was the most likely cause of these differences?
  • (A) Yes, the treatments are the only *likely* explanation for the differences.
  • Random assignment of the treatments will likely balance all other factors leading to gum disease.
  • (B) No, other factors are likely to have contributed to the difference.
  • Consider the effect of randomly assigning the flossing treatments on the other likely factors.

Another study on flossing and gum disease (observational study)

In another study on flossing and gum disease, a random sample of 500 people were surveyed on their flossing habits and whether or not they suffered from gum disease. Suppose that it was estimated that the people that flossed after every meal was much less likely to have gum disease as the group that flossed once per day.

    Q-65: Did this study use random assignment of treatments?
  • (A) Yes
  • Note that the participants were not randomly assigned a flossing treatment, but were able to decide on their own flossing habits.
  • (B) No
    Q-66: Based on the fact that there was a large difference between the groups, is it safe to say that the difference in flossing frequency was the most likely cause of these differences?
  • (A) Yes, the differences in flossing habits of the participants are the only *likely* explanation for the differences.
  • The lack of random assignment of the treatments gives us no protection against other factors (like genetics) that effect gum disease.
  • (B) No, other factors are likely to have contributed to the difference.
  • The lack of random assignment of the treatments gives us no protection against other factors (like genetics) that effect gum disease.

5.2. Example 1.1: Insurance Fraud - Deafness

image110

Consider the following case study centered on potential insurance fraud regarding deafness. This case study was presented in an article by Pankratz, Fausti, and Peed titled “A Forced-Choice Technique to Evaluate Deafness in the Hysterical or Malingering Patient.” Source: Journal of Consulting and Clinical Psychology, 1975, Vol. 43, pg. 421-422. The following is an excerpt from the article:

The patient was a 27-year-old male with a history of multiple hospitalizations for idiopathic convulsive disorder, functional disabilities, accidents, and personality problems. His hospital records indicated that he was manipulative, exaggerated his symptoms to his advantage, and that he was a generally disruptive patient. He made repeated attempts to obtain compensation for his disabilities. During his present hospitalization he complained of bilateral hearing loss, left-sided weakness, left-sided numbness, intermittent speech difficulty, and memory deficit. There were few consistent or objective findings for these complaints. All of his symptoms disappeared quickly with the exception of the alleged hearing loss.

5.3. Simulating the Guessing Distribution

In each of the examples in this chapter, we will be using TinkerPlots to simulate a distribution that is in contrast to our research hypothesis. In the case of evaluating the truthfulness of the claimant in the hearing loss example, our hypothesis is that the claimant was being purposefully dishonest when answering the questions. To test this claim, we will simulate the opposite distribution, that is the distribution of someone that cannot hear and is simply guessing which color is associated with the sound. We are looking for evidence that the claimant missed an unusual number of questions, compared to the number of correct answers of someone that was just guessing.

Note

Recall that we use a p-value to determine if a specified value is unusual. You might wish to review the section on p-values!

In the following video, we will illustrate simulating the guessing distribution and calculating a p-value using TinkerPlots.

5.4. Questions about Example 1.1

Recall that the subject was correct on 36 out of 100 trials when he was asked to identify whether the tone played with either the red or the blue light bulb.

Check your understanding

    Q-67: What is the population of interest?
  • (A) All trials of the experiment for a hearing impaired individual.
  • Here we are imagining giving the survey to a claimant many, many times. This allows us to imagine what the number of correct answers should be if a person was truly guessing.
  • (B) The 100 answer of a hearing impaired person that were observed.
  • (C) Whether or not the hearing impaired person answered correctly.
    Q-68: Which of the following best describes the sample?
  • (A) All trials of the experiment for a hearing impaired individual.
  • (B) The 100 answer of a hearing impaired person that were observed.
  • We are thinking of the 100 answers are the sample for this experiement. In particular, we would note whether or not each questions was answered correctly.a
  • (C) Whether or not the hearing impaired person answered correctly.
    Q-69: Which of the following best describes the variable?
  • (A) All trials of the experiment for a hearing impaired individual.
  • (B) The 100 answer of a hearing impaired person that were observed.
  • (C) Whether or not the hearing impaired person answered correctly.
  • We are thinking of the 100 answers are the sample for this experiement. In particular, we would note whether or not each questions was answered correctly.a

Recall that we carried out a simulation study to determine whether this patient who was suspected of malingering had obtained too few correct answers. The results of the simulation study indicate what outcomes we expect from a guessing subject:

image1
    Q-70: What does each dot/star represent?
  • (A) One set of 100 trials
  • HINT: Last class we recorded the number of correct guesses in 12 trials, but now we are doing 100 trials. What should we record?
  • (B) One correct guess
  • HINT: Last class we recorded the number of correct guesses in 12 trials, but now we are doing 100 trials. What should we record?
  • (C) One trial
  • HINT: Last class we recorded the number of correct guesses in 12 trials, but now we are doing 100 trials. What should we record?
  • (D) The number of correct guesses in 100 trials
  • Each dot represents to the number of correct guesses in 100 trials
    Q-71: Based on the results of this simulation study, do you believe the patient’s outcome of 36 correct out of 100 was consistent with guessing, or do these results indicate that he may have been answering incorrectly on purpose in order to mislead the researchers into believing he was hearing impaired?
  • (A) Answering incorrectly on purpose
  • This answer would not be consistent with guessing, as the number of times that a hearing impaired person would guess 36 or fewer correct is very low.
  • (B) Consistent with guessing
  • Guessing 36 correct is not a typical result. Notice that it is very rare to guess 36 or fewer correct. If someone is guessing, it would be much more likely to between 40 and 60 answers correct.
    Q-72: Now suppose that another person was tested in the same way, and this person answered 48 out of the 100 trials correctly. Can we confidently conclude that this person was purposefully answering the questions incorrectly?
  • (A) Yes
  • To confidently establish that the person was answering incorrectly on purpose, we would need the person to get an unusually small number of correct answers. In this case, 48 correct would **not** be unusual as it happened fairly frequently in the simultion.
  • (B) No
  • While the person may have been answering incorrectly on purpose, their results were not unusual when compaired to someone that was guessing. Therefore we cannot confidently conclude that this person was making mistakes on purpose.

Next, we will work through a similar example and again use a simulation to determine how much evidence we have about a research claim.

5.5. Example 1.2 Helper vs. Hinderer?

In a study reported in a November 2007 issue of Nature, researchers investigated whether infants take into account an individual’s actions towards others in evaluating that individual as appealing or aversive, perhaps laying the foundation for social interaction (Hamlin, Wynn, and Bloom, 2007). In one component of the study, sixteen 10-month-old infants were shown a “climber” character (a piece of wood with “google” eyes glued onto it) that could not make it up a hill in two tries. Then they were shown two scenarios for the climber’s next try, one where the climber was pushed to the top of the hill by another character (“helper”) and one where the climber was pushed back down the hill by another character (“hinderer”). The infant was alternately shown these two scenarios several times. Then the child was presented with both pieces of wood (the helper and the hinderer) and asked to pick one to play with. The color and shape and order (left/right) of the toys were varied and balanced out among the 16 infants.

References

Hamlin, J. Kiley, Karen Wynn, and Paul Bloom. “Social evaluation by preverbal infants.” November 22, 2007. Nature, Volume 150.

Introducing Concepts of Statistical Inference. Rossman, Chance, Cobb, and Holcomb. NSF/DUE/CCLI # 0633349.

Questions

  1. Why was it important for the researchers to balance out the color, shape, and order of the toys across the study? For example, how would the study results have been affected if the researchers always made the helper toy a blue circle and the hinderer a yellow triangle?
  2. Identify the following in the context of this example:
    • Population of interest:
    • Sample:
    • Variable of interest:
    • Data type:
  3. Recall that this study involves 16 infants. If the population of all 10-month-old infants has no real preference for one toy over the other, how many infants do you expect to choose the helper toy? Explain.
  4. Suppose that 10 out of 16 infants choose the helper toy (62.5%). Since this value is higher than 50%, a researcher argues that these data show that the majority of all 10-month-old infants would choose the helper toy. What is wrong with their reasoning?

Once again, the key question is how to determine whether the study’s result is surprising under the assumption that there is no real preference for one toy over the other in the population of all 10-month-old infants. To answer this, we will simulate the process of 16 infants simply choosing a toy at random, over and over again. Each time we simulate the process, we’ll keep track of how many infants out of the 16 chose the *helper toy* (note that you could also keep track of the number that chose the hinderer toy). Once we’ve repeated this process a large number of times, we’ll have a pretty good sense for what outcomes would be very surprising, somewhat surprising, or not so surprising if the population of all 10-month-old infants has no real preference.

Simulation

Carry out the Tinkerplots simulation. Note that you should consider the following questions when designing your simulation study:

  • What are the two possible outcomes on each of the trials? Change the values on your spinner accordingly.
  • What is the probability that each outcome occurs under the assumption that the population of all 10-month-old infants has no real preference for either toy? Change your spinner accordingly.
  • Be sure to change the Draw value to 1 since only one infant is choosing a toy at a time.
  • How many infants were used in this study? Keep this value in mind when setting the Repeat value.

Carry out the simulation study 1000 times overall, keeping track of the number of infants that choose the helper toy in each of the simulated experiments.

Sketch in your results below:

image2
image8

Questions

  1. What does each dot on this plot represent?
  2. Suppose that in the actual study 10 out of 16 infants chose the helper toy. Would this convince you that the majority of the population of all 10-month-old infants had a preference for the helper toy? Why or why not?
  3. The actual study results are as follows: 14 out of 16 infants chose the helper toy. Mark this on the axis above the results of your simulations study. Based on this statistical investigation, what should the researchers conclude? Recall that their research question was stated as follows: Do 10-month-old infants tend to prefer the helper toy over the hinderer toy?
Next Section - 6. Methods for a Single Categorical Variable