Torture the data long enough and eventually it will confess the truth.
-- A.T. Goode
This paper will report on my tangles with quantitative research so far in my program. I learned in high school that my reading vocabularies far exceeded my written vocabulary, and the same is true for statistics: I understand far more than I can manufacture. My goal was to learn enough about statistics so that I can work professionally with statisticians. I had two opportunities to talk with statisticians. Once I consulted them about an article which confused me, and once about my own final project. I felt I understood their suggestions and certainly knew about the various tests they mentioned. I have named the sections of this report Knowledge Building; First Lessons; Statistics, Risks, and Policy; Meta-Analysis; and Speaking About Numbers (my encounter with professional statisticians).
Knowledge building. Conjecture, testing, and refutation builds Knowledge, with a capital "K," at least according to Karl Popper (1965). This means any person with a conjecture, theory or finding, must be prepared to discuss it either in the literature of the field, such as journals, or at professional conferences.
Research seeks to help us understand what is going on in our world. Researchers begin with some question about as aspect of human experience, and attempt to use approaches or methods which their colleagues will accept as valid means of answering the question.
Here we have the first two opportunities for refutation. Is the phenomena the researcher proposes to investigate, or does investigate, worthy of study? This question of worth could swing in either direction. On one side, the project may be concerned with issues of extremely minor importance.
On the other side, though the research has brought to light new information, has he or she overstepped a moral or social taboo to get it? I am thinking about the Public Health Service's Tuskegee Syphilis Study and the experiments performed by the Nazi's during World War II where they used Jews to discover the extremes of temperature to which a human can be subjected and still survive. Also in this domain of concern, though perhaps in a gray area, would be Stanley Milgram's experiments on the "Perils of Obedience" (1974, Internet). They reside in a gray area because Milgram and his supporters have not been convinced that he stepped over the line. But because of situations like these we have Human Subject Review panels established for all research involving humans. We also have to get informed consent statements signed by people who participate in experiments. Today, under these constraints, which resulted in part from his research, I doubt if Milgram could get permission to conduct his experiments again.
First Lessons. For the duration of my professional life I will need to partner with professional statisticians for support with quantitative elements of my research projects. Turning to a random problem at the back of Langley's book, Practical Statistics (1970:348): a salesperson offers the secretary of a recreation center table tennis balls which the manufacturer claims can withstand eleven pounds steady weight without breaking, on average. The secretary draws six out of the salesperson's bag, at random. He privately decides to purchase the balls if the "average of his sample implies that the salesman's statement is true, but not if the probability of this being the case is only 5% or less." The breaking points were: 9.5, 8, 11, 11.5, 8.5, and 9.0 pounds.
The mean for this sample is the sum of the results divided by the number sampled, or 57.5/6 = 9.583. The standard deviation is the square root of the sum of squared deviations from the mean.
|Value of X||d= x - mean||d * d|
|mean = 9.58|| ||total = 9.709|
The square root of 9.709 = 3.116. The standard deviation is d*d/n-1=
(9.709/6-1) = 1.39. So far, so good. Now I want to know which test of significance
to use? Since I do not know the standard deviation of the larger population
I must use student's t test which works with small sample sizes and known
population and sample means (symbolized as M and m, respectively) (p. 398).
t = SQRTn * |M - m| /s= (SQRT6 * [11 - 9.583])/1.39 = 2.449 * 1.417/1.39
t = 2.50
Looking at a table for t values, 2.5 falls between the 5% and 10% columns, meaning that there is a greater than a five percent chance that this sample mean is representative of the population. The secretary bought the balls.
Important elements for solving this problem highlights the process of making statistical inferences: what is the sample size, what is known about the mean and standard deviation of the population, what test of significance applies, and how confident do we need to be about the significance of the difference between the sample mean and the population mean? We predetermine the level of significance, look up a figure from the appropriate table, and viola! our choice is made.
Doing this little problem just now turned up some interesting facets about my learning about statistics. I immediately knew to get the mean and standard deviation. I knew where to look for which test to run, and how to fit all of the known measures into the formulas. I lost track of the n (using 8 for a while) and had to pay careful attention to when to take square roots.
Though in the example the secretary bought the balls, this suggests that they were competitively priced with the balls in current use. We do not know if there exists an industry standard for the pressure balls must sustain, or if the current balls in use met or missed the 11 pound test. We also do not know if one brand of balls offered qualities players liked over other brands (for example, one brand may color and number their balls to make identification easier). This means that though the statistical tests reduced the need for judgment in this example, it does so at the cost of making many presumption.
Statistics, risks, and policy. From watching the 26 part Against All Odds series which explored how statistics play a vital role in the world we live in, I saw how researchers used statistics right. I am a consultant with the Department of Health on projects to reduce tobacco use, so I watched with riveted fascination Episode 11, which presented the research supporting Surgeon General Luther Terry's 1964 report on smoking and lung disease. I learned more about the background planning from Lee Fritschler's (1975) Smoking and Politics. Because he anticipated a heated public discussion of the findings, Terry understood he had to pay extreme attention to the details of the meta-analysis of the research, including panel selection, studies reviewed, and public dissemination of findings.
To develop a list from which to build the study panel, Terry invited the tobacco industry and the research supporting foundations (the so-called tri-agencies: American Cancer Society, American Lung Association, and the American Heart Association) to suggest researchers and scholars. From the interim list these same groups could reject any names the way a lawyer might reject a prospective juror. In addition, anyone who had made a public statement about the links between smoking and cancer was rejected from the panel. In fact, after the invitations had been offered, one person was removed for making a comment to the media which seemed to imply that he already believed there was a link.
On the day that Terry held the press conference to release the panel's findings, his office stage-managed the event precisely. Reporters were ushered into the conference room, and the doors were locked. They were told that the doors would be opened at the end of the presentation, at which point they could use a bank of phone to file their stories.
The care with which the Surgeon General conducted the panel, when combined with the dramatic presentation of the results, persuaded the media and legitimate researchers that there is a link between smoking and lung disease based on the 6,000 studies used. Nothing has shaken that belief, and most of the over 45,000 research studies on tobacco substantiate it. Yet the tobacco industry continues to "refute" the claims, often with the help of hired academic researchers.
The tobacco industry understands that many people lack great sophistication about statistics. For thirty years, the most exasperating refutation that they have made is to simply say, "You say there's a link, but that is only statistics." I began to understand this tactic when I understood who the tobacco industry meant to speak to: current and potential nicotine users. Though this type of argument is most often presented at tobacco- industry funded "research symposiums" and then reprinted in their paid advertisements, they were speaking past the media and academia to reach their customers or future customers. They were offering them arguments to deny and delay coming to terms with the harmfulness of tobacco. These refutations would be useful to the smoker in arguing with "well meaning" friends and family; and internally to the smoker as well, as a barrier to any attempts at quitting.
A successful "refutation" of these type of "excuses" takes the resources of a clinical or experimental setting, at least according to a study by Janis and Mann (1977: 344-365). This study offered aid to 74 white middle- and lower-class men and women who responded to ads from the Yale Smokers' Clinic offering help in cutting down. Each subject was presented with eight typical rationalizations and pressed to acknowledge that he or she used it to continued smoking. Rationalizations (or "excuses") 1, 2 and 8 touched on perceptions of research and risk:
1. "It hasn't really been proven that cigarette smoking is a cause of lung cancer."
2. "The only possible health problem cause by cigarettes that one might face is lung cancer, and you don't really see a lot of that."
8. "So smoking may be a risk, big deal! So is most of life! I enjoy smoking too much to give it up" (p. 349).
To understand the impact of the experience, consider what happens to a young woman in our role-playing sessions. On arriving at the laboratory she is met by the experimenter who tells her that aim of the study is to examine two important problems about the human side of medical practice: how patients react to bad news and how hey feel abut a doctor's advice to quit an enjoyable habit like smoking. She is then asked to imagine that the experimenter is really a physician who is treating her for a bad cough that is not getting better. She is to assume that this is the third visit to his office, and this time she has come to learn the results of X-rays and other medical tests that were previously carried out. The experimenter outlines the scenario of a psychodrama consisting of five different scenes and asks the subject to act the scenes out, role-playing each as realistically as possible (p. 350-351).
Cancer...oh, my God! I can't believe this...Oh, God, if it's only benign, that's all I ask for. One out of three [survive]! Holy Smokes, with my luck I'll be the-one of the fatalities...I've read all of the reports and I just wouldn't believe them...Why did I ever pick up that stupid habit? I know that it causes cancer. I'm not kidding anybody. I know it does. I just thought-I was hoping that it would never happen to me...(p. 351)
The [Surgeon General's] report did not have much effect on me. But I was in this other study [over a year ago]; a professor was doing this psychological thing and I was one of the volunteers. And that was what really affected me...He was the one that scared me, not the report...I got to thinking, what if it were really true and I had to go home and tell everyone that I had cancer. And right then I decided I would not go through this again, and if there were any way of preventing it, I would. And I stopped smoking. It was really that professor's study that made me quit (352-354).
Is this a clinically significant finding? The Interactive programs had an effect size of approximately .20 across all subsets of programs compared to .02 for the Non-Interactive programs...this modest effect size is equal to a success rate of 9.5% and 1%, respectively. This is clearly a clinically significant finding, particularly when the mean delivery intensity was just ten hours.
In terms of policy decisions, the study of the effect of aspirin on heart attacks, which involved 22,000 doctors in a randomized double-blind study, was canceled because an r value of .035 (success rate of 3.5%) indicated that it would be unethical to not offer the treatment to the control group. Currently, Non-Interactive programs are used by the overwhelming majority of schools. Replacing the present programs would increase the effectiveness of school-based programs by 8.5% (r=0.85). These clinically significant findings for the Interactive programs were observed for all adolescents, including varied minority populations, and were equal for tobacco, alcohol, marijuana, and illicit drugs (references removed, p. 23).
|Table 6. Logistic Regression Analysis of Receipt of Adequate Prenatal Care (Likelihood Ratio Statistic with 7 degrees of freedom = 42.95 (p=.0001).|
|Variable||Odds Ratio||95% Confidence Level|
|Spousal living situation||4.57||(2.14,9.75)|
|Use of alcohol||.28||(0.51,3.19)|
|Use of drugs||.17||(0.02,1.52)|
|Health insurance at |
time of delivery
|Initial reaction to pregnancy||1.34||(0.94,1.91)|
|Being black (vs. Hispanic)||.59||(0.24,1.48)|