I’m writing this post with a little more haste than is my wont. I’ve received dozens of e-mails asking me to comment on the recent news — ala the the New York Times — that meat-eating apparently causes premature death and disease. So this post is likely to contain more than my usual number of typos, egregious spelling mistakes, grammatical errors, etc. Bear with me. Rather than spend a week rewriting and editing, as I usually do, I’m going to do my best to get this up and out in a few hours.
Back in 2007 when I first published Good Calories, Bad Calories I also wrote a cover story in the New York Times Magazine on the problems with observational epidemiology. The article was called “Do We Really Know What Makes Us Healthy?” and I made the argument that even the better epidemiologists in the world consider this stuff closer to a pseudoscience than a real science. I used as a case study the researchers from the Harvard School of Public Health, led by Walter Willett, who runs the Nurses’ Health Study. In doing so, I wanted to point out one of the main reasons why nutritionists and public health authorities have gone off the rails in their advice about what constitutes a healthy diet. The article itself pointed out that every time in the past that these researchers had claimed that an association observed in their observational trials was a causal relationship, and that causal relationship had then been tested in experiment, the experiment had failed to confirm the causal interpretation — i.e., the folks from Harvard got it wrong. Not most times, but every time. No exception. Their batting average circa 2007, at least, was .000.
Now it’s these very same Harvard researchers — Walter Willett and his colleagues — who have authored this new article claiming that red meat and processed meat consumption is deadly; that eating it regularly raises our risk of dying prematurely and contracting a host of chronic diseases. Zoe Harcombe has done a wonderful job dissecting the paper at her site. I want to talk about the bigger picture (in a less concise way).
This is an issue about science itself and the quality of research done in nutrition. Those of you who have read Good Calories, Bad Calories (The Diet Delusion in the UK) know that in the epilogue I make a point to say that I never used the word scientist to describe the people doing nutrition and obesity research, except in very rare and specific cases. Simply put, I don’t believe these people do science as it needs to be done; it would not be recognized as science by scientists in any functioning discipline.
Science is ultimately about establishing cause and effect. It’s not about guessing. You come up with a hypothesis — force x causes observation y — and then you do your best to prove that it’s wrong. If you can’t, you tentatively accept the possibility that your hypothesis was right. Peter Medawar, the Nobel Laureate immunologist, described this proving-it’s-wrong step as the “the critical or rectifying episode in scientific reasoning.” Here’s Karl Popper saying the same thing: “The method of science is the method of bold conjectures and ingenious and severe attempts to refute them.” The bold conjectures, the hypotheses, making the observations that lead to your conjectures… that’s the easy part. The critical or rectifying episode, which is to say, the ingenious and severe attempts to refute your conjectures, is the hard part. Anyone can make a bold conjecture. (Here’s one: space aliens cause heart disease.) Making the observations and crafting them into a hypothesis is easy. Testing them ingeniously and severely to see if they’re right is the rest of the job — say 99 percent of the job of doing science, of being a scientist.
The problem with observational studies like those run by Willett and his colleagues is that they do none of this. That’s why it’s so frustrating. The hard part of science is left out and they skip straight to the endpoint, insisting that their interpretation of the association is the correct one and we should all change our diets accordingly.
In these observational studies, the epidemiologists establish a cohort of subjects to follow (tens of thousands of nurses and physicians, in this case) and then ask them about what they eat. The fact that they use questionnaires that are notoriously fallible is almost irrelevant here because the rest of the science is so flawed. Then they follow the subjects for decades — 28 years in this case. Now they have a database of diseases, deaths and foods consumed, and they can draw associations between what these people were eating and the diseases and deaths.
The end result is an association. In the latest report, eating a lot of red meat and processed meat is associated with premature death and increased risk of chronic disease. That’s what they observed in the cohorts — the observation. The subjects who ate the most meat (the top quintile) had a 20 percent greater risk of dying over the course of the study than the subjects who ate the least meat (the bottom quintile). This association then generates a hypothesis, which is why these associations used to be known as “hypothesis-generating data” (before Willett and his colleagues and others like them decided they got tired of their hypotheses being shot down by experiments and they’d skip this step). Because of the association that we’ve observed, so this thinking goes, we now hypothesize that eating red meat and particularly processed meat is bad for our health and we will live longer and prosper more if we don’t do it. We hypothesize that the cause of the association we’ve observed is that red and processed meat is unhealthy stuff.
Terrific. We have our bold conjecture. What should we do next?
Well, because this is supposed to be a science, we ask the question whether we can imagine other less newsworthy explanations for the association we’ve observed. What else might cause it? An association by itself contains no causal information. There are an infinite number of associations that are not causally related for every association that is, so the fact of the association itself doesn’t tell us much.
Moreover, this meat-eating association with disease is a tiny association. Tiny. It’s not the 20-fold increased risk of lung cancer that pack-a-day smokers have compared to non-smokers. It’s a 0.2-fold increased risk — 1/100th the size. So with lung cancer we could buy as a society the observation that cigarettes cause lung cancer because it was and remains virtually impossible to imagine what other factor could explain an association so huge and dramatic. Experiments didn’t need to be done to test the hypothesis because, well, the signal was just so big that the epidemiologists of the time could safely believe it was real. And then experiments were, in effect, done anyway. People quit smoking and lung cancer rates came down, or at least I assume they did. (If not, we’re in trouble here.) When I first wrote about the pseudoscience of epidemiology in Science back in 1995, “Epidemiology Faces It’s Limits”, I noted that very few epidemiologists would ever take seriously an association smaller than a 3- or 4-fold increase in risk. These Harvard people are discussing, and getting an extraordinary amount of media attention, over a 0.2-fold increased risk. (Horn-blowing alert: my Science article has since been cited by over 400 articles in the peer-reviewed medical literature, according to Thomson Reuter’s Web of Knowledge.)
So how can we explain this tiny association between the risk of eating a lot of red and processed meat — the 1/100th-the-size-of-the-lung-cancer-cigarette effect–compared to eating virtually none? Again, we have an observation — or an association, two or more things happening in concert; let’s think of all the possible reasons that might explain why these two variables, meat-eating and disease, associate together in our cohorts of nurses and physicians. Here’s how the great German pathologist Rudolph Virchow phrased this in 1849: How, he said, can we “with certainty decide which of two coexistent phenomena is the cause and which the effect, whether one of them is the cause at all instead of both being effects of a third cause, or even whether both are effects of two entirely unrelated causes?” This is the hard part.
The answer ultimately is that we do experiments, which is what Virchow went on to discuss. But we’ll get back to this in a minute. Before we get around to doing the experiments, we must rack our brains to figure out if there are other causal explanations for this association beside the the meat-eating one. Another way to think of this is that we’re looking for all the myriad possible ways our methodology and equipment might have fooled us. The first principle of good science, as Richard Feynman liked to say, is that you must not fool yourself and you’re the easiest person to fool. And so before we go public and commit ourselves to believing this association is meaningful and causal, let’s think of all the ways we might be fooled. Once we’ve thought up every possible, reasonable alternative hypotheses (space aliens are out on this account), we can then go about testing them to see which ones survive the tests: our preferred hypothesis (meat-eating causes disease, in this case) or one of the many others we’ve considered.
So let’s think of reasonable ways in which people who eat a lot of meat might be different from people who don’t, looking specifically for differences that might also explain some or all of the association we observed between meat-eating, disease and premature death. What else can explain this association, which might have nothing to do with whatever happens when we consume meat or processed meat?
Zoe Harcombe made this point beautifully using the Harvard data. The obvious clue is that as we move from the bottom quintile of meat-eaters (those who are effectively vegetarians) to the top quintile of meat-eaters we see an increase in virtually every accepted unhealthy behavior — smoking goes up, drinking goes up, sedentary behavior (or lack of physical activity) goes up — and we also see an increase in markers for unhealthy behaviors — BMI goes up, blood pressure, etc. So what could be happening here?
If you go back and read my New York Times Magazine article on this research, you’ll see that I discussed a whole host of effects, known technically as confounders — they confound the interpretation of the association — that could explain associations between two variables but have nothing to do biologically with the variables themselves. One of these confounders is called the compliance or adherer effect. Heres’ what I said about it in the article:
The Bias of Compliance
A still more subtle component of healthy-user bias has to be confronted. This is the compliance or adherer effect. Quite simply, people who comply with their doctors’ orders when given a prescription are different and healthier than people who don’t. This difference may be ultimately unquantifiable. The compliance effect is another plausible explanation for many of the beneficial associations that epidemiologists commonly report, which means this alone is a reason to wonder if much of what we hear about what constitutes a healthful diet and lifestyle is misconceived.
The lesson comes from an ambitious clinical trial called the Coronary Drug Project that set out in the 1970s to test whether any of five different drugs might prevent heart attacks. The subjects were some 8,500 middle-aged men with established heart problems. Two-thirds of them were randomly assigned to take one of the five drugs and the other third a placebo. Because one of the drugs, clofibrate, lowered cholesterol levels, the researchers had high hopes that it would ward off heart disease. But when the results were tabulated after five years, clofibrate showed no beneficial effect. The researchers then considered the possibility that clofibrate appeared to fail only because the subjects failed to faithfully take their prescriptions.
As it turned out, those men who said they took more than 80 percent of the pills prescribed fared substantially better than those who didn’t. Only 15 percent of these faithful “adherers” died, compared with almost 25 percent of what the project researchers called “poor adherers.” This might have been taken as reason to believe that clofibrate actually did cut heart-disease deaths almost by half, but then the researchers looked at those men who faithfully took their placebos. And those men, too, seemed to benefit from adhering closely to their prescription: only 15 percent of them died compared with 28 percent who were less conscientious. “So faithfully taking the placebo cuts the death rate by a factor of two,” says David Freedman, a professor of statistics at the University of California, Berkeley [who passed away, regrettably, in 2008]. “How can this be? Well, people who take their placebo regularly are just different than the others. The rest is a little speculative. Maybe they take better care of themselves in general. But this compliance effect is quite a big effect.”
The moral of the story, says Freedman, is that whenever epidemiologists compare people who faithfully engage in some activity with those who don’t — whether taking prescription pills or vitamins or exercising regularly or eating what they consider a healthful diet — the researchers need to account for this compliance effect or they will most likely infer the wrong answer. They’ll conclude that this behavior, whatever it is, prevents disease and saves lives, when all they’re really doing is comparing two different types of people who are, in effect, incomparable.
This phenomenon is a particularly compelling explanation for why the Nurses’ Health Study and other cohort studies saw a benefit of H.R.T. [hormone replacement therapy, one subject of the article] in current users of the drugs, but not necessarily in past users. By distinguishing among women who never used H.R.T., those who used it but then stopped and current users (who were the only ones for which a consistent benefit appeared), these observational studies may have inadvertently focused their attention specifically on, as Jerry Avorn says, the “Girl Scouts in the group, the compliant ongoing users, who are probably doing a lot of other preventive things as well.”
It’s this compliance effect that makes these observational studies the equivalent of conventional wisdom-confirmation machines. Our public health authorities were doling out pretty much the same dietary advice in the 1970s and 1980s, when these observational studies were starting up, as they are now. The conventional health-conscious wisdom of the era had it that we should eat less fat and saturated fat, and so less red meat, which also was supposed to cause colon cancer, less processed meat (those damn nitrates) and more fruits and vegetables and whole grains, etc. And so the people who are studied in the cohorts could be divided into two groups: those who complied with this advice — the Girl Scouts, as Avorn put it — and those who didn’t.
Now when we’re looking at the subjects who avoided red meat and processed meat and comparing them to the subjects who ate them in quantity, we can think of it as effectively comparing the Girl Scouts to the non-Girl Scouts, the compliers to the conventional wisdom to the non-compliers. And the compliance effect tells us right there that we should see an association — that the Girl Scouts should appear to be healthier. Significantly healthier. Actually they should be even healthier than Willet et al. are now reporting, which suggests that there’s something else working against them (not eating enough red meat?). In other words, the people who avoided red meat and processed meats were the ones who fundamentally cared about their health and had the energy (and maybe the health) to act on it. And the people who ate a lot of red meat and processed meat in the 1980s and 1990s were the ones who didn’t.
Here’s another way to look at it: let’s say we wanted to identify markers of people who were too poor or too ignorant to behave in a health conscious manner in the 1980s and 1990s or just didn’t, if you’ll pardon the scatological terminology, give a sh*t. Well, we might look at people who continued to eat a lot of bacon and red meat after Time magazine ran this cover image in 1984 — “Cholesterol, and now the bad news”. I’m going to use myself as an example here, realizing it’s always dangerous and I’m probably an extreme case. But I lived in LA in the 1990s where health conscious behavior was and is the norm, and I’d bet that I didn’t have more than half a dozen servings of bacon or more than two steaks a year through the 1990s. It was all skinless chicken breasts and fish and way too much pasta and cereal (oatmeal or some other non-fat grain) and thousands upon thousands of egg whites without the yolks. Because that’s what I thought was healthy.
So when we compare people who ate a lot of meat and processed meat in this period to those who were effectively vegetarians, we’re comparing people who are inherently incomparable. We’re comparing health conscious compliers to non-compliers; people who cared about their health and had the income and energy to do something about it and people who didn’t. And the compliers will almost always appear to be healthier in these cohorts because of the compliance effect if nothing else. No amount of “correcting” for BMI and blood pressure, smoking status, etc. can correct for this compliance effect, which is the product of all these health conscious behaviors that can’t be measured, or just haven’t been measured. And we know this because they’re even present in randomized controlled trials. When the Harvard people insist they can “correct” for this, or that it’s not a factor, they’re fooling themselves. And we know they’re fooling themselves because the experimental trials keep confirming that.
That was the message of my 2007 article. As one friend put it years ago to me (and I wish I could remember who so I could credit him or her properly), when these cohort studies compare mostly health advice compliers to non-compliers, they might as well be comparing Berkeley vegetarians who eat at Alice Water’s famous Chez Panisse restaurant once a week after their yoga practice to redneck truck drivers from West Virginia whose idea of a night on the town is chicken-fried steak (and potatoes and beer and who knows what else) at the local truck stop. The researchers can imply, as Willett and his colleagues do, that the most likely reason these people have different levels of morbidity and mortality is the amount of meat they eat; but that’s only because that’s what Willett and his colleagues have to believe to justify the decades of work and tens, if not hundreds, of millions of dollars that have been spent on these studies. Not because it’s the most likely explanation. It’s far more likely that the difference is caused by all the behaviors that associate with meat-eating or effective vegetarianism — whether you are, in effect, a Girl Scout or not.
This is why the best epidemiologists — the one’s I quote in the NYT Magazine article — think this nutritional epidemiology business is a pseudoscience at best. Observational studies like the Nurses’ Health Study can come up with the right hypothesis of causality about as often as a stopped clock gives you the right time. It’s bound to happen on occasion, but there’s no way to tell when that is without doing experiments to test all your competing hypotheses. And what makes this all so frustrating is that the Harvard people don’t see the need to look for alternative explanations of the data — for all the possible confounders — and to test them rigorously, which means they don’t actually see the need to do real science.
As I said, it’s a sad state of affairs.
Now we’re back to doing experiments — i.e., how we ultimately settle this difference of opinion. This is science. Do the experiments. We have alternative causal explanations for the tiny association between meat-eating and morbidity and mortality. One is that it’s the meat itself. The other is that it’s the behaviors that associate with meat-eating. So do an experiment to see which is right. How do we do it? Well you can do it with an N of 1. Switch your diet, see what happens. Or we can get more meaningful information by starting with your cohort of subjects and assigning them at random either to a diet rich red meat and processed meat, or to a diet that’s not — a mostly vegetarian diet. By assigning subjects at random to one of these two interventions, we mostly get rid of the behavioral (and socio-economic and educational…) factors that might associate with choosing of your own free will whether to be a vegetarian (or a mostly-vegetarian) or a meat-eater.
So we do a randomized-controlled trial. Take as many people as we can afford, randomize them into two groups — one that eats a lot of red meat and bacon, one that eats a lot of vegetables and whole grains and pulses-and very little red meat and bacon — and see what happens. These experiments have effectively been done. They’re the trials that compare Atkins-like diets to other more conventional weight loss diets — AHA Step 1 diets, Mediterranean diets, Zone diets, Ornish diets, etc. These conventional weight loss diets tend to restrict meat consumption to different extents because they restrict fat and/or saturated fat consumption and meat has a lot of fat and saturated fat in it. Ornish’s diet is the extreme example. And when these experiments have been done, the meat-rich, bacon-rich Atkins diet almost invariably comes out ahead, not just in weight loss but also in heart disease and diabetes risk factors. I discuss this in detail in chapter 18 of Why We Get Fat, “The Nature of a Healthy Diet.” The Stanford A TO Z Study is a good example of these experiments. Over the course of the experiment — two years in this case — the subjects randomized to the Atkins-like meat- and bacon-heavy diet were healthier. That’s what we want to know.
Now Willett and his colleagues at Harvard would challenge this by saying somewhere along the line, as we go from two years out to decades, this health benefit must turn into a health detriment. How else can they explain why their associations are the opposite of what the experimental trials conclude? And if they don’t explain this away somehow, they might have to acknowledge that they’ve been doing pseudoscience for their entire careers. And maybe they’re right, but I certainly wouldn’t bet my life on it.
Ultimately we’re left with a decision about what we’re going to believe: the observations, or the experiments designed to test those observations. Good scientists will always tell you to believe the experiments. That’s why they do them.
Egregious (and embarrassing) error correction: In an early version of the post, I suggested that if you read the chapter on nutritional epidemiology in the textbook Modern Epidemiology, you’d see that the best epidemiologists agree that this pursuit is pathological. A reader from my institution — a UC Berkeley grad student — pointed out that the chapter on nutritional epi in the textbook was actually written by Walter Willett and that, not surprisingly, it does not agree with this position. Here’s how Willett ends that chapter:
The last two decades have seen enormous progress in the development of nutritional epidemiology methods. Work by many investigators has provided clear support for the essential underpinnings of this field. Substantial between-person variation in consumption of most dietary factors in populations has been demonstrated, methods to measure diet applicable to epidemiologic studies have been developed, and their validity has been documented. Based on this evidence, many large prospective cohort studies have been established that are providing a wealth of data on many outcomes that will be reported during the next decade. In addition, methods to account for errors in measurement of dietary intake have been developed and are beginning to be applied in reporting findings from studies of diet and disease.
Nutritional epidemiology has contributed importantly to understanding the etiology of many diseases. Low intake of fruits and vegetables has been shown to be related to increased risk of cardiovascular disease. Also, a substantial amount of epidemiologic evidence has accumulated indicating that replacing saturated and trans fats with unsaturated fats can play an important role in the prevention of coronary heart disease and type 2 diabetes. Many diseases—as diverse as cataracts, neural-tube defects, and macular degeneration—that were not thought to be nutritionally related have been found to have important dietary determinants. Nonetheless, much more needs to be learned regarding other diet and disease relations, and the dimensions of time and ranges of dietary intakes need to be expanded further. Furthermore, new products are constantly being introduced into the food supply, which will require continued epidemiologic vigilance.
The development and evaluation of additional methods to measure dietary factors, particularly those using biochemical methods to assess long-term intake, can contribute substantially to improvements in the capacity to assess diet and disease relations. Also, the capacity to identify those persons at genetically increased risk of disease will allow the study of gene–nutrient interactions that are almost sure to exist. The challenges posed by the complexities of nutritional exposures are likely to spur methodologic developments. Such developments have already occurred with respect to measurement error. The insights gained will have benefits throughout the field of epidemiology.
Now the reason I made this mistake is because I was rushing (no excuse, despite the warning up front) and so working from memory about a chapter that the UCLA epidemiologist Sander Greenland, one of the editor/authors of Modern Epidemiology, sent me when I was writing the New York Times Magazine article in 2007. The chapter Greenland was discussing and that he had sent me at the time was one he had authored, chapter 19 — “Bias Analysis” — and it was discussing observational epidemiology in general.
Here’s Greenland on the problem with all these studies — nutritional epi included — and how they’re interpreted:
Conventional methods assume all errors are random and that any modeling assumptions (such as homogeneity) are correct. With these assumptions, all uncertainty about the impact of errors on estimates is subsumed within conventional standard deviations for the estimates (standard errors), such as those given in earlier chapters (which assume no measurement error), and any discrepancy between an observed association and the target effect may be attributed to chance alone. When the assumptions are incorrect, however, the logical foundation for conventional statistical methods is absent, and those methods may yield highly misleading inferences. Epidemiologists recognize the possibility of incorrect assumptions in conventional analyses when they talk of residual confounding (from nonrandom exposure assignment), selection bias (from nonrandom subject selection), and information bias (from imperfect measurement). These biases rarely receive quantitative analysis, a situation that is understandable given that the analysis requires specifying values (such as amount of selection bias) for which little or no data may be available. An unfortunate consequence of this lack of quantification is the switch in focus to those aspects of error that are more readily quantified, namely the random components.
Systematic errors can be and often are larger than random errors, and failure to appreciate their impact is potentially disastrous. The problem is magnified in large studies and pooling projects, for in those studies the large size reduces the amount of random error, and as a result the random error may be but a small component of total error. In such studies, a focus on “statistical significance” or even on confidence limits may amount to nothing more than a decision to focus on artifacts of systematic error as if they reflected a real causal effect.