Skip to main content

Demystifying Research Papers (Part II)

In the first part of Demystifying Research Papers, I went over how research articles are usually constructed. I purposely didn’t talk much about the Results section because I wanted to do that in a separate essay. A well-written Results section not only shows the outcome of all the relevant experiments but also helps the readers to follow the rationale behind those experiments. Here, instead of directly elaborating on the Results section, I want to go over a hypothetical question to explain the rationale and structure of this section.
Confusion: Gerard Mignot
Suppose we want to investigate if green tea has a beneficial effect on cancer treatment, and our hypothesis is simply this: green tea is good for cancer. Before we do anything, we need to look into older publications carefully (doing searches on, for example)[1] because we don’t want to spend time or money if that study has already been done. We also need to read up on publications related to green tea and treating cancer because we need to talk about that in our introduction. For example, if no one has looked into the question, in our introduction, we could write why we are interested in green tea (i.e. other documented benefits of drinking green tea) and explain we want to study the effect of green tea in cancer because no one has looked into it. Without that the reader might wonder why we thought of green tea, and not turnip.

Let’s suppose we don’t find any publication in our search, so we are good to begin our experiments. Let’s start with a thought experiment. Suppose we ask 500 people who regularly drink green tea and another 500 who rarely drink green tea whether or not they have cancer. After tabulation, we find that only 100 tea-drinkers have cancer whereas 400 people from the other group have cancer. While it may seem that our study clearly proves the beneficial effect of green tea in cancer because of this strong correlation, in science, correlation is not good enough.

How do we know, for example, that the people who have cancer did not have genetic predispositions to cancer or unhealthy lifestyles? Perhaps the people who drink tea have healthier lifestyles in general and healthier lifestyle, not green tea, is the reason why cancer is less prevalent in them. In other words, our experiment was flawed because we have failed to rule out other factors—often called confounding factors—that could explain our data. So let’s now think of some better experiments, experiments that would take care of those confounding factors.

Perhaps one of the simplest experiments would be to look at the effect of green tea in a dish: we can grow cancer cells in two dishes and treat one of them with green tea extract to see if what happens to cancer cells. Here, the dish that didn’t get any extract would be the control dish and the other, the experimental dish. It’s important to have a control set because we need that perspective to reach a conclusion. If we just treated one dish with green tea extract and 70% of the cells died, what conclusion can we draw here without a control? It’s possible that if we grow those cells in a dish, that’s the death rate is similar even without green tea extract (i.e. green tea didn’t do anything). Control, hence, gives us some platform to draw conclusions from our experiments.

Let’s suppose green tea extract killed 90% of the cancer cells compared to 10% in the control dish. To objectively conclude results from an experimental setup, scientists tend to replicate their experiments several times (in other words, we would need more than just two dishes) and use statistics to analyze the data. Usually, anything less than P = .05 is considered to be statistically significant. What that means is that the probability of getting a certain result purely by chance is less than 5% or 5 in 100. If we are presenting our data from these two dishes in bar graphs, we could use asterisks to denote the statistical significance in our graph.

The next logical step might be to see whether green tea extract reduces cancer in mice. We can determine what happens to mice with cancer when we give them green tea extract (as before, we’d have at least two groups). We need to set up some parameters to detect any potential benefit of green tea because we are dealing with a much more complex system than just cells in a dish. We could show, for example, the autopsy results with or without green tea treatment. We could also show that the experimental group had longer lifespans than the control group to show that green tea could extend cancer patients’ lives.

Speaking of patients, if we are convinced that green tea can reduce cancer in mice, we could study its effect on people next. We could select carefully a group of cancer patients to minimize confounding factors (that is their genetic predisposition and lifestyles and so on are very similar) and divide them into two groups: we give one group the green tea extract and other, a sugar pill (the control). This is necessary to eliminate potential placebo effects where patients’ belief in the medication can mask the effectiveness of the drug. Because of their belief that the medication is good, they can often show signs of improvement even though the drug they are receiving might not be doing anything. We then wait for a while and then reexamine their cancer status to determine the benefits of green tea extract.

These studies are usually done in a double-blind fashion: not only the patients don’t know what pills they are receiving but also the person giving these pills and/or “scoring” cancer status doesn’t know who got what. This veil is only lifted when data is ready to be analyzed. This eliminates bias issues. Whenever possible, scientists try to incorporate this idea in other experiments as well.

Now, if I am presented with a paper like this, would I start drinking green tea? I personally would want more data before I do that. For example, if the dose of green tea extract we used in our experiments is very high, drinking a couple of cups of green tea may not be beneficial. A better study would have been to look at the effect of green tea with different concentrations. If only a high dose of green tea has cancer benefit, I would like to know about the side effects of consuming that much tea. Is it going to ruin my liver if I drink too much tea? It would also be great to know what compound in green tea is responsible here. Since we only used green tea extract here, I would be curious to know if other tea extracts could do the same or if it’s something specific to green tea.

And finally, we have problems with our hypothesis because we were not precise enough. Are we looking at the preventive or the curative effect of green tea? What type or types of cancer are we looking at and why? Surely we can’t look at all types of cancer because it would be very difficult to justify the huge cost for such a study when all we have is an educated guess when we came up with our hypothesis (i.e. green tea is beneficial for a lot of diseases, so it could benefit cancer patients as well). Scientists try to be very specific when they state their hypotheses to avoid these problems.

Now that we have gone over the general gist of research papers, I hope next time you see a headline like “Scientists found a cure for cancer”, you will be able to look at the original data and draw your own conclusions. It’s possible that you may even end up detecting a logical or experimental flaw that had escaped the original authors as well as the reviewers. Now that would be wonderful, wouldn’t it?

[1] PubMed is a free search engine with an extensive database of references and abstracts from life sciences and biomedical topics. I highly recommend this site over google if you really want to search on a specific topic in biology.

Popular posts from this blog

How Genetics Could Have Helped Charlie Chaplin

In 1943, actress Joan Barry gave birth to Carol Ann and claimed that Charlie Chaplin, the famous actor and director, was Ann’s father. And when Chaplin denied the claim, Barry filed a lawsuit against him demanding child support. About a year and a half later, a California Jury voted 11 to 1 in Barry’s favor. Chaplin’s appeal for the verdict was unsuccessful, and he was forced to pay child support and court fees. Was Chaplin really the father of Barry’s daughter? We don’t need to go over Chaplin’s private letters or fancy DNA testing to get an answer—we just need some basic understanding of genetics and some readily available information on Chaplin’s and Ann’s blood type. In this essay, I want to go over those things to show why Chaplin couldn’t have been Ann’s biological father. Charlie Chaplin in The Gold Rush (1925). Courtesy: Wikipedia Normally, most of our cells contain 23 pairs or 46 chromosomes, the tightly wound DNA strands. A sperm or an egg, however, is an exception: a

What If The Synonyms Went Away?

In 1984 , George Orwell described how devastating it would be if we were to reduce our vocabulary/dictionary. We need appropriate words for complex thoughts, and Orwell reasoned that it would be impossible to have complex thoughts without those words. It would be, for example, very difficult for us to talk about totalitarianism if the word didn’t exist in our vocabulary. But what happens when we get rid of some synonyms in our genetic code? That’s what Fredens and his team wanted to find out. They described their findings in their recent paper , and here, I want to go over that paper. Since we normally don’t think of synonyms when we think of biology, let me explain what I mean.  If the purpose of life is to produce food, then we can think of our DNA as an encyclopedic cooking book that we could use to make a particular dish. Like the book, all the information a cell needs to make a protein or RNA  is contained within our DNA (I will explain what RNA is later on). Unlike the book,

Vaccine Development II: Strategies

In the first part of this series on vaccine development, I went over how our immune system responds to pathogens like viruses or bacteria. Briefly, when our body encounters a novel pathogen, specialized cells from our immune system create antibodies that bind to specific molecular signatures called antigens found on that pathogen. The blueprints for effective antibodies are retained as memory so that we can quickly produce large quantities of those antibodies when needed.  To develop a vaccine that can protect us from a particular pathogen, hence, we need to somehow elicit these responses without getting sick from that disease. In this essay, I will describe how researchers try to achieve that.1 Let’s come up with some strategies with the information we already have from the first part of this essay. Assuming the antigens are present, can’t we use dead pathogens to elicit the same immune response? Indeed, in the 19th century, scientists discovered that inactivated or killed microbes