Torture the data long enough and eventually it will confess the truth.
-- A.T. Goode
This paper will report on my tangles with quantitative research so far in
my program. I learned in high school that my reading vocabularies far exceeded
my written vocabulary, and the same is true for statistics: I understand
far more than I can manufacture. My goal was to learn enough about statistics
so that I can work professionally with statisticians. I had two opportunities
to talk with statisticians. Once I consulted them about an article which
confused me, and once about my own final project. I felt I understood their
suggestions and certainly knew about the various tests they mentioned. I
have named the sections of this report Knowledge Building; First Lessons;
Statistics, Risks, and Policy; Meta-Analysis; and Speaking About Numbers
(my encounter with professional statisticians).
Knowledge building. Conjecture, testing, and refutation builds Knowledge,
with a capital "K," at least according to Karl Popper (1965).
This means any person with a conjecture, theory or finding, must be prepared
to discuss it either in the literature of the field, such as journals, or
at professional conferences.
Research seeks to help us understand what is going on in our world. Researchers
begin with some question about as aspect of human experience, and attempt
to use approaches or methods which their colleagues will accept as valid
means of answering the question.
Here we have the first two opportunities for refutation. Is the phenomena
the researcher proposes to investigate, or does investigate, worthy of study?
This question of worth could swing in either direction. On one side, the
project may be concerned with issues of extremely minor importance.
On the other side, though the research has brought to light new information,
has he or she overstepped a moral or social taboo to get it? I am thinking
about the Public Health Service's Tuskegee Syphilis Study and the experiments
performed by the Nazi's during World War II where they used Jews to discover
the extremes of temperature to which a human can be subjected and still
survive. Also in this domain of concern, though perhaps in a gray area,
would be Stanley Milgram's experiments on the "Perils of Obedience"
(1974, Internet). They reside in a gray area because Milgram and his supporters
have not been convinced that he stepped over the line. But because of situations
like these we have Human Subject Review panels established for all research
involving humans. We also have to get informed consent statements signed
by people who participate in experiments. Today, under these constraints,
which resulted in part from his research, I doubt if Milgram could get permission
to conduct his experiments again.
First Lessons. For the duration of my professional life I will need
to partner with professional statisticians for support with quantitative
elements of my research projects. Turning to a random problem at the back
of Langley's book, Practical Statistics (1970:348): a salesperson offers
the secretary of a recreation center table tennis balls which the manufacturer
claims can withstand eleven pounds steady weight without breaking, on average.
The secretary draws six out of the salesperson's bag, at random. He privately
decides to purchase the balls if the "average of his sample implies
that the salesman's statement is true, but not if the probability of this
being the case is only 5% or less." The breaking points were: 9.5,
8, 11, 11.5, 8.5, and 9.0 pounds.
The mean for this sample is the sum of the results divided by the number
sampled, or 57.5/6 = 9.583. The standard deviation is the square root of
the sum of squared deviations from the mean.
Value of X | d= x - mean | d * d |
---|---|---|
9.5 | -.0833 | .007 |
8 | -1.583 | 2.506 |
11 | 1.417 | 2.008 |
11.5 | 1.917 | 3.675 |
8.5 | -1.083 | 1.173 |
9.0 | -.583 | .340 |
mean = 9.58 | | total = 9.709 |
The square root of 9.709 = 3.116. The standard deviation is d*d/n-1=
(9.709/6-1) = 1.39. So far, so good. Now I want to know which test of significance
to use? Since I do not know the standard deviation of the larger population
I must use student's t test which works with small sample sizes and known
population and sample means (symbolized as M and m, respectively) (p. 398).
t = SQRTn * |M - m| /s= (SQRT6 * [11 - 9.583])/1.39 = 2.449 * 1.417/1.39
t = 2.50
Looking at a table for t values, 2.5 falls between the 5% and 10% columns,
meaning that there is a greater than a five percent chance that this sample
mean is representative of the population. The secretary bought the balls.
Important elements for solving this problem highlights the process of making
statistical inferences: what is the sample size, what is known about the
mean and standard deviation of the population, what test of significance
applies, and how confident do we need to be about the significance of the
difference between the sample mean and the population mean? We predetermine
the level of significance, look up a figure from the appropriate table,
and viola! our choice is made.
Doing this little problem just now turned up some interesting facets about
my learning about statistics. I immediately knew to get the mean and standard
deviation. I knew where to look for which test to run, and how to fit all
of the known measures into the formulas. I lost track of the n (using 8
for a while) and had to pay careful attention to when to take square roots.
Though in the example the secretary bought the balls, this suggests that
they were competitively priced with the balls in current use. We do not
know if there exists an industry standard for the pressure balls must sustain,
or if the current balls in use met or missed the 11 pound test. We also
do not know if one brand of balls offered qualities players liked over other
brands (for example, one brand may color and number their balls to make
identification easier). This means that though the statistical tests reduced
the need for judgment in this example, it does so at the cost of making
many presumption.
Statistics, risks, and policy. From watching the 26 part Against
All Odds series which explored how statistics play a vital role in the
world we live in, I saw how researchers used statistics right. I am a consultant
with the Department of Health on projects to reduce tobacco use, so I watched
with riveted fascination Episode 11, which presented the research supporting
Surgeon General Luther Terry's 1964 report on smoking and lung disease.
I learned more about the background planning from Lee Fritschler's (1975)
Smoking and Politics. Because he anticipated a heated public discussion
of the findings, Terry understood he had to pay extreme attention to the
details of the meta-analysis of the research, including panel selection,
studies reviewed, and public dissemination of findings.
To develop a list from which to build the study panel, Terry invited the
tobacco industry and the research supporting foundations (the so-called
tri-agencies: American Cancer Society, American Lung Association, and the
American Heart Association) to suggest researchers and scholars. From the
interim list these same groups could reject any names the way a lawyer might
reject a prospective juror. In addition, anyone who had made a public statement
about the links between smoking and cancer was rejected from the panel.
In fact, after the invitations had been offered, one person was removed
for making a comment to the media which seemed to imply that he already
believed there was a link.
On the day that Terry held the press conference to release the panel's findings,
his office stage-managed the event precisely. Reporters were ushered into
the conference room, and the doors were locked. They were told that the
doors would be opened at the end of the presentation, at which point they
could use a bank of phone to file their stories.
The care with which the Surgeon General conducted the panel, when combined
with the dramatic presentation of the results, persuaded the media and legitimate
researchers that there is a link between smoking and lung disease based
on the 6,000 studies used. Nothing has shaken that belief, and most of the
over 45,000 research studies on tobacco substantiate it. Yet the tobacco
industry continues to "refute" the claims, often with the help
of hired academic researchers.
The tobacco industry understands that many people lack great sophistication
about statistics. For thirty years, the most exasperating refutation that
they have made is to simply say, "You say there's a link, but that
is only statistics." I began to understand this tactic when I understood
who the tobacco industry meant to speak to: current and potential nicotine
users. Though this type of argument is most often presented at tobacco-
industry funded "research symposiums" and then reprinted in their
paid advertisements, they were speaking past the media and academia to reach
their customers or future customers. They were offering them arguments to
deny and delay coming to terms with the harmfulness of tobacco. These refutations
would be useful to the smoker in arguing with "well meaning" friends
and family; and internally to the smoker as well, as a barrier to any attempts
at quitting.
A successful "refutation" of these type of "excuses"
takes the resources of a clinical or experimental setting, at least according
to a study by Janis and Mann (1977: 344-365). This study offered aid to
74 white middle- and lower-class men and women who responded to ads from
the Yale Smokers' Clinic offering help in cutting down. Each subject was
presented with eight typical rationalizations and pressed to acknowledge
that he or she used it to continued smoking. Rationalizations (or "excuses")
1, 2 and 8 touched on perceptions of research and risk:
1. "It hasn't really been proven that cigarette smoking is a cause of lung cancer."
2. "The only possible health problem cause by cigarettes that one might face is lung cancer, and you don't really see a lot of that."
8. "So smoking may be a risk, big deal! So is most of life! I enjoy smoking too much to give it up" (p. 349).
To understand the impact of the experience, consider what happens to a young woman in our role-playing sessions. On arriving at the laboratory she is met by the experimenter who tells her that aim of the study is to examine two important problems about the human side of medical practice: how patients react to bad news and how hey feel abut a doctor's advice to quit an enjoyable habit like smoking. She is then asked to imagine that the experimenter is really a physician who is treating her for a bad cough that is not getting better. She is to assume that this is the third visit to his office, and this time she has come to learn the results of X-rays and other medical tests that were previously carried out. The experimenter outlines the scenario of a psychodrama consisting of five different scenes and asks the subject to act the scenes out, role-playing each as realistically as possible (p. 350-351).
Cancer...oh, my God! I can't believe this...Oh, God, if it's only benign, that's all I ask for. One out of three [survive]! Holy Smokes, with my luck I'll be the-one of the fatalities...I've read all of the reports and I just wouldn't believe them...Why did I ever pick up that stupid habit? I know that it causes cancer. I'm not kidding anybody. I know it does. I just thought-I was hoping that it would never happen to me...(p. 351)
The [Surgeon General's] report did not have much effect on me. But I was in this other study [over a year ago]; a professor was doing this psychological thing and I was one of the volunteers. And that was what really affected me...He was the one that scared me, not the report...I got to thinking, what if it were really true and I had to go home and tell everyone that I had cancer. And right then I decided I would not go through this again, and if there were any way of preventing it, I would. And I stopped smoking. It was really that professor's study that made me quit (352-354).
Is this a clinically significant finding? The Interactive programs had an effect size of approximately .20 across all subsets of programs compared to .02 for the Non-Interactive programs...this modest effect size is equal to a success rate of 9.5% and 1%, respectively. This is clearly a clinically significant finding, particularly when the mean delivery intensity was just ten hours.
In terms of policy decisions, the study of the effect of aspirin on heart attacks, which involved 22,000 doctors in a randomized double-blind study, was canceled because an r value of .035 (success rate of 3.5%) indicated that it would be unethical to not offer the treatment to the control group. Currently, Non-Interactive programs are used by the overwhelming majority of schools. Replacing the present programs would increase the effectiveness of school-based programs by 8.5% (r=0.85). These clinically significant findings for the Interactive programs were observed for all adolescents, including varied minority populations, and were equal for tobacco, alcohol, marijuana, and illicit drugs (references removed, p. 23).
Table 6. Logistic Regression Analysis of Receipt of Adequate Prenatal Care (Likelihood Ratio Statistic with 7 degrees of freedom = 42.95 (p=.0001). | ||
---|---|---|
Variable | Odds Ratio | 95% Confidence Level |
Spousal living situation | 4.57 | (2.14,9.75) |
Planned pregnancy | 1.27 | (0.51,3.19) |
Use of alcohol | .28 | (0.51,3.19) |
Use of drugs | .17 | (0.02,1.52) |
Health insurance at time of delivery | .55 | (0.15,1.95) |
Initial reaction to pregnancy | 1.34 | (0.94,1.91) |
Being black (vs. Hispanic) | .59 | (0.24,1.48) |