I have divided this paper on research and methodology into three sections:

Speaking of Numbers
Excerpted Portions of Hawkins/Benard Discussion (w/ bibliography)

Speaking of Numbers

1999 John Perkins

Torture the data long enough and eventually it will confess the truth.
-- A.T. Goode

This paper will report on my tangles with quantitative research so far in my program. I learned in high school that my reading vocabularies far exceeded my written vocabulary, and the same is true for statistics: I understand far more than I can manufacture. My goal was to learn enough about statistics so that I can work professionally with statisticians. I had two opportunities to talk with statisticians. Once I consulted them about an article which confused me, and once about my own final project. I felt I understood their suggestions and certainly knew about the various tests they mentioned. I have named the sections of this report Knowledge Building; First Lessons; Statistics, Risks, and Policy; Meta-Analysis; and Speaking About Numbers (my encounter with professional statisticians).

Knowledge building. Conjecture, testing, and refutation builds Knowledge, with a capital "K," at least according to Karl Popper (1965). This means any person with a conjecture, theory or finding, must be prepared to discuss it either in the literature of the field, such as journals, or at professional conferences.

Research seeks to help us understand what is going on in our world. Researchers begin with some question about as aspect of human experience, and attempt to use approaches or methods which their colleagues will accept as valid means of answering the question.

Here we have the first two opportunities for refutation. Is the phenomena the researcher proposes to investigate, or does investigate, worthy of study? This question of worth could swing in either direction. On one side, the project may be concerned with issues of extremely minor importance.

On the other side, though the research has brought to light new information, has he or she overstepped a moral or social taboo to get it? I am thinking about the Public Health Service's Tuskegee Syphilis Study and the experiments performed by the Nazi's during World War II where they used Jews to discover the extremes of temperature to which a human can be subjected and still survive. Also in this domain of concern, though perhaps in a gray area, would be Stanley Milgram's experiments on the "Perils of Obedience" (1974, Internet). They reside in a gray area because Milgram and his supporters have not been convinced that he stepped over the line. But because of situations like these we have Human Subject Review panels established for all research involving humans. We also have to get informed consent statements signed by people who participate in experiments. Today, under these constraints, which resulted in part from his research, I doubt if Milgram could get permission to conduct his experiments again.

First Lessons. For the duration of my professional life I will need to partner with professional statisticians for support with quantitative elements of my research projects. Turning to a random problem at the back of Langley's book, Practical Statistics (1970:348): a salesperson offers the secretary of a recreation center table tennis balls which the manufacturer claims can withstand eleven pounds steady weight without breaking, on average. The secretary draws six out of the salesperson's bag, at random. He privately decides to purchase the balls if the "average of his sample implies that the salesman's statement is true, but not if the probability of this being the case is only 5% or less." The breaking points were: 9.5, 8, 11, 11.5, 8.5, and 9.0 pounds.

The mean for this sample is the sum of the results divided by the number sampled, or 57.5/6 = 9.583. The standard deviation is the square root of the sum of squared deviations from the mean.

Value of X d= x - mean d * d
9.5 -.0833 .007
8 -1.583 2.506
11 1.417 2.008
11.5 1.917 3.675
8.5 -1.083 1.173
9.0 -.583 .340
mean = 9.58
total = 9.709

The square root of 9.709 = 3.116. The standard deviation is d*d/n-1= (9.709/6-1) = 1.39. So far, so good. Now I want to know which test of significance to use? Since I do not know the standard deviation of the larger population I must use student's t test which works with small sample sizes and known population and sample means (symbolized as M and m, respectively) (p. 398).

t = SQRTn * |M - m| /s= (SQRT6 * [11 - 9.583])/1.39 = 2.449 * 1.417/1.39

t = 2.50

Looking at a table for t values, 2.5 falls between the 5% and 10% columns, meaning that there is a greater than a five percent chance that this sample mean is representative of the population. The secretary bought the balls.

Important elements for solving this problem highlights the process of making statistical inferences: what is the sample size, what is known about the mean and standard deviation of the population, what test of significance applies, and how confident do we need to be about the significance of the difference between the sample mean and the population mean? We predetermine the level of significance, look up a figure from the appropriate table, and viola! our choice is made.

Doing this little problem just now turned up some interesting facets about my learning about statistics. I immediately knew to get the mean and standard deviation. I knew where to look for which test to run, and how to fit all of the known measures into the formulas. I lost track of the n (using 8 for a while) and had to pay careful attention to when to take square roots.

Though in the example the secretary bought the balls, this suggests that they were competitively priced with the balls in current use. We do not know if there exists an industry standard for the pressure balls must sustain, or if the current balls in use met or missed the 11 pound test. We also do not know if one brand of balls offered qualities players liked over other brands (for example, one brand may color and number their balls to make identification easier). This means that though the statistical tests reduced the need for judgment in this example, it does so at the cost of making many presumption.

Statistics, risks, and policy. From watching the 26 part Against All Odds series which explored how statistics play a vital role in the world we live in, I saw how researchers used statistics right. I am a consultant with the Department of Health on projects to reduce tobacco use, so I watched with riveted fascination Episode 11, which presented the research supporting Surgeon General Luther Terry's 1964 report on smoking and lung disease. I learned more about the background planning from Lee Fritschler's (1975) Smoking and Politics. Because he anticipated a heated public discussion of the findings, Terry understood he had to pay extreme attention to the details of the meta-analysis of the research, including panel selection, studies reviewed, and public dissemination of findings.

To develop a list from which to build the study panel, Terry invited the tobacco industry and the research supporting foundations (the so-called tri-agencies: American Cancer Society, American Lung Association, and the American Heart Association) to suggest researchers and scholars. From the interim list these same groups could reject any names the way a lawyer might reject a prospective juror. In addition, anyone who had made a public statement about the links between smoking and cancer was rejected from the panel. In fact, after the invitations had been offered, one person was removed for making a comment to the media which seemed to imply that he already believed there was a link.

On the day that Terry held the press conference to release the panel's findings, his office stage-managed the event precisely. Reporters were ushered into the conference room, and the doors were locked. They were told that the doors would be opened at the end of the presentation, at which point they could use a bank of phone to file their stories.

The care with which the Surgeon General conducted the panel, when combined with the dramatic presentation of the results, persuaded the media and legitimate researchers that there is a link between smoking and lung disease based on the 6,000 studies used. Nothing has shaken that belief, and most of the over 45,000 research studies on tobacco substantiate it. Yet the tobacco industry continues to "refute" the claims, often with the help of hired academic researchers.

The tobacco industry understands that many people lack great sophistication about statistics. For thirty years, the most exasperating refutation that they have made is to simply say, "You say there's a link, but that is only statistics." I began to understand this tactic when I understood who the tobacco industry meant to speak to: current and potential nicotine users. Though this type of argument is most often presented at tobacco- industry funded "research symposiums" and then reprinted in their paid advertisements, they were speaking past the media and academia to reach their customers or future customers. They were offering them arguments to deny and delay coming to terms with the harmfulness of tobacco. These refutations would be useful to the smoker in arguing with "well meaning" friends and family; and internally to the smoker as well, as a barrier to any attempts at quitting.

A successful "refutation" of these type of "excuses" takes the resources of a clinical or experimental setting, at least according to a study by Janis and Mann (1977: 344-365). This study offered aid to 74 white middle- and lower-class men and women who responded to ads from the Yale Smokers' Clinic offering help in cutting down. Each subject was presented with eight typical rationalizations and pressed to acknowledge that he or she used it to continued smoking. Rationalizations (or "excuses") 1, 2 and 8 touched on perceptions of research and risk:

1. "It hasn't really been proven that cigarette smoking is a cause of lung cancer."

2. "The only possible health problem cause by cigarettes that one might face is lung cancer, and you don't really see a lot of that."

8. "So smoking may be a risk, big deal! So is most of life! I enjoy smoking too much to give it up" (p. 349).

For each excuse the interviewer played a short tape recorded lecture in which he presented factual information designed to refute that "excuse." In the contrasting information only procedure, subjects heard the lecture with the instructions to listen carefully and think about the rationalizations. Both groups then viewed two films from the American Cancer Society. After all of this the follow-up showed inconclusive results and raised doubts if this type of procedure could be used to induce a person to commit him or herself to a new decision and live up to it (p.349).

Janis and Mann persisted, and investigated the role emotional confrontation might play in inducing a new decision. They tried a set of psychodramas with young female smokers around the age of 20. This time, the subjects were recruiting without knowing advance that the research involved their personal smoking habits.
To understand the impact of the experience, consider what happens to a young woman in our role-playing sessions. On arriving at the laboratory she is met by the experimenter who tells her that aim of the study is to examine two important problems about the human side of medical practice: how patients react to bad news and how hey feel abut a doctor's advice to quit an enjoyable habit like smoking. She is then asked to imagine that the experimenter is really a physician who is treating her for a bad cough that is not getting better. She is to assume that this is the third visit to his office, and this time she has come to learn the results of X-rays and other medical tests that were previously carried out. The experimenter outlines the scenario of a psychodrama consisting of five different scenes and asks the subject to act the scenes out, role-playing each as realistically as possible (p. 350-351).

At this point, the experimenter puts on a white lab coat and a stethoscope around his neck and begins to assume the manner and tone of a physician. The scenes escalate in intensity in this order: the subject is asked to express outloud her thoughts while sitting in the waiting room; then after being ushered into his "office" she is told there is a malignant spot on her right lung and surgery is recommended as soon as possible. In the third scene the physician leaves his desk to make arrangements for a hospital bed, and the subject is left alone to speak out loud her thoughts. At this point, one subject said:
Cancer...oh, my God! I can't believe this...Oh, God, if it's only benign, that's all I ask for. One out of three [survive]! Holy Smokes, with my luck I'll be the-one of the fatalities...I've read all of the reports and I just wouldn't believe them...Why did I ever pick up that stupid habit? I know that it causes cancer. I'm not kidding anybody. I know it does. I just thought-I was hoping that it would never happen to me...(p. 351)

In scene four, the subject is told that surgery will require at least six weeks hospitalization. Finally, the last scene has the physician raising questions about the patient's smoking history, and after pointing out the connection between smoking and cancer, urges her to quit immediately.

Janis and Mann report that this role-playing "ordeal" lasted just over an hour and produced great emotional involvement. As a comparison, another group got exposed to the same distressing information and listened to a tape of one of the subjects in the experimental group. Interesting, this research occurred six months before the Surgeon General's report in 1964. Practically all of the subjects cut back their smoking right after the experiments, and again after the Surgeon General report. Eighteen months later experimental subjects showed less cigarette consumption than the control group (p. 352). Many offered spontaneous comments like this one:
The [Surgeon General's] report did not have much effect on me. But I was in this other study [over a year ago]; a professor was doing this psychological thing and I was one of the volunteers. And that was what really affected me...He was the one that scared me, not the report...I got to thinking, what if it were really true and I had to go home and tell everyone that I had cancer. And right then I decided I would not go through this again, and if there were any way of preventing it, I would. And I stopped smoking. It was really that professor's study that made me quit (352-354).

Clearly, the decision to break an outworn decision and set a new course takes emotional engagement with what is already "known" and accepting its applicability to one's own life and expectations. In addition, Janis and Mann conducted more research and found that not everyone is responsive to emotional role-playing (p. 357). This just emphasizes the importance to decision-making routines using both research and emotional engagement early in the behavior adoption process before a harmful habit has gelled. The next section will review a meta-analysis of high school based drug prevention efforts.

Meta-Analysis. A meta-analysis seeks to combine the results of many studies in order to reach a clearer understanding. Nancy Tobler and Howard Stratton (1995) did a meta-analysis of school based drug prevention programs. I saw Tobler present this paper at the National Prevention Network Conference in 1995. In this paper, Tobler and Stratton sorted the drug prevention programs by whether they were interactive (which mean they involved youth in small group discussions, role plays, and the like) or non-interactive. In summary, the wrote:
Is this a clinically significant finding? The Interactive programs had an effect size of approximately .20 across all subsets of programs compared to .02 for the Non-Interactive programs...this modest effect size is equal to a success rate of 9.5% and 1%, respectively. This is clearly a clinically significant finding, particularly when the mean delivery intensity was just ten hours.

In terms of policy decisions, the study of the effect of aspirin on heart attacks, which involved 22,000 doctors in a randomized double-blind study, was canceled because an r value of .035 (success rate of 3.5%) indicated that it would be unethical to not offer the treatment to the control group. Currently, Non-Interactive programs are used by the overwhelming majority of schools. Replacing the present programs would increase the effectiveness of school-based programs by 8.5% (r=0.85). These clinically significant findings for the Interactive programs were observed for all adolescents, including varied minority populations, and were equal for tobacco, alcohol, marijuana, and illicit drugs (references removed, p. 23).

This confirms something I already knew from the previous research and my direct experience: engaging people (at whatever age) helps them internalize their own understandings for positive decision making. Kurt Lewin (1947a and 1947b) demonstrated 50 years ago the value of letting a small group hear new information and then discuss, decide and commit to action within the group setting.

Tobler and Stratton may have inadvertently uncovered a deeper epistemological worldview in the schools: non-interactive teaching may be the norm throughout the curriculum. Educators may think that students bring nothing to their intellectual task of learning, and that simply presenting information in an non-interactive format is what it takes for children to learn. In the metaphors used by the schools, more "material" can be "covered" that way. Hopefully, it will not take another fifty years for the theories of instruction to catch up with the theories of human emotional, intellectual, and moral development. Like a smoker who quit in the 1930s as soon as the first studies emerged linking smoking and lung disease, there is a lot to gain by inferring the truth from preliminary studies and acting accordingly.

Speaking about numbers. As part of my research for my current work (and as one of my internships) I came across an article by Winsome Parchment, Gerson Weiss, and Marion Passannante (1996), "Is the Lack of Health Insurance the Major Barrier to Early Prenatal Care at an Inner-City Hospital?" As there were references to advanced statistical techniques in the article, I decided to make an appointment with the biostatistics department for help.

As part of the process I filled out an appointment sheet and included the article. I got an appointment a few days before I would be making use of the article at a conference.

I had anticipated meeting with a graduate student or two-I actually met with a professor, Nancy Temkin, and two graduate students, one at the Ph.D. level and the other at the masters level. The team approach simulates for the students the dynamics they will face as professional statisticians consulting on projects. Having a professor present offered guidance and quality control. In asking for an appointment, I had explained that I understood most of the article until I came upon Table 6 on p. 103:

Table 6. Logistic Regression Analysis of Receipt of Adequate Prenatal Care (Likelihood Ratio Statistic with 7 degrees of freedom = 42.95 (p=.0001).
Variable Odds Ratio 95% Confidence Level
Spousal living situation 4.57 (2.14,9.75)
Planned pregnancy 1.27 (0.51,3.19)
Use of alcohol .28 (0.51,3.19)
Use of drugs .17 (0.02,1.52)
Health insurance at
time of delivery
.55 (0.15,1.95)
Initial reaction to pregnancy 1.34 (0.94,1.91)
Being black (vs. Hispanic) .59 (0.24,1.48)

I am comfortable with tables showing p values and understand 95% confidence levels. In the discussion portion of the article, only the spousal living situation received attention as being significant. My first questions sought to understand what this chart meant to communicate with the odds ratio.

Temkin's team explained the chart this way: the odds ratio answered the question what are the odds that this result is significant. The confidence level showed the range of odds one could expect 95 percent of the time from resampling the same population. In other words, if the researchers resampled this population 20 times, 19 of the odds ratios for spousal living situation would fall between 2.14 to 1 and 9.75 to 1. If the confidence level crossed 1, or even odds, as they rephrased it, the variable lacked significance. Therefore, from how Parchment et al managed their data, only spousal living situation had significance.

I had expected Temkin's team to explain terms and presentations of data. They went even further. They pointed out that the results as presented could have a great deal of cross-variable contamination. Also, what Parchment et al present as separate variables may, in fact, be causally linked. Temkin's team pointed out that health insurance may be closely related to spousal living situation, though the data presented doesn't make that clear. They also noticed that the use of drugs (alcohol, tobacco, illicit, and misused prescription drugs) cannot be as neatly separated as Parchment et al have placed them in their table: one variable for alcohol use and another for drug use.

I caught the drift of their reasoning and contributed that many of these variables (hassles of being pregnant, drug use, lack of social support, fear of medical institutions, etc.) actually may be experienced as a package by a woman, often summed up in a phrase like, "My life is a mess."

They showed me how 2 and Fisher exact tests worked and when each would be appropriate. I learned that Fisher's tests were created by Ronald Fisher, the father of modern statistics. I had come across this test and name before in my readings in statistical inference. Temkin's team, being statisticians, filled me in on other activities of Fisher. He was one of the main critics of the early studies which showed the relationship between smoking and lung disease. I learned that he smoked a pipe, and also that he died two years before Surgeon General Luther Terry released the famous report on smoking. One wonders how he would have responded to the report, and whether he himself would have stopped smoking then.

Which raised an issue I had noticed in the article: the researchers had not revealed their ethnic, class, or racial backgrounds. I know first hand from experiences teaching parenting classes, that poor families must be guarded about how much they reveal and to whom. They may rather down play a genuine concern than risk losing a child into the, for them, all-too-present clutches of Child Protective Services. I would often advise them to be careful how they shared a story so that I would not hear anything I was legally bound to report.

Similarly, I discussed with Temkin's team that such factors as fear of hospitals and care providers may have a higher impact than found in the Parchment report because the research team may have lacked the rapport and sympathy to get respondents to honestly talk about their thoughts about the medical establishment.

I found my first experience working with professional statisticians really helpful and enjoyable. I was surprised at the richness they could bring to data presented in a single table. In my PDE I will discuss my second visit with a team from the statistics department to discuss my dissertation project.

As I continue my work, I will seek out the help of statisticians because of the added insight and skill they bring to working with quantitative research data. I have learned that with help I can understand statistical inference and grasp the particular tools I may need for a project. My continued education in statistics will come one project at a time, one unfamiliar new term at a time, like the growth of my verbal vocabulary.


Excerpted Portions of Hawkins/Benard Discussion (w/ bibliography)