Authors
Dr. DeWayne Derryberry
Idaho State University
Objective
Use the results of a retrospective study to determine if there is a positive association between smoking and lung cancer, and estimate the risk of lung cancer for smokers relative to non-smokers.
Background
When dealing with categorical data and rare events, sample size is often a problem. For example, if we want to show an association between smoking and lung cancer, there are four groups to consider – smokers with and without cancer, and non-smokers with and without cancer. It can be shown (see the exercises) that the power of any such study is limited by the sample size in the smallest of these groups.
In any large population, cancer of any particular type is quite rare. In many epidemiological studies of rare events (as with many diseases), the only way to get a large sample size for the groups that have the disease is to wait until the disease has occurred to collect the data – targeting those with the disease. This kind of study, where we wait for the outcome and then collect the data, is called a retrospective study.
In a retrospective study we have those with the disease (the cases) and must use subject-matter expertise to select a comparable group without the disease (the controls). We then examine the differences between these groups with regard to some factor we consider to be potentially causal for the disease. If we believe the cases and controls are similar in other ways, we can make an argument for causality.
For example, let’s say we’re interested in studying lung cancer and smoking. We can find people who have lung cancer and others without lung cancer who are otherwise comparable, and compare their smoking activity. If the lung cancer patients are more often smokers, can we make a plausible argument that smoking causes lung cancer? And, how strong is this argument?
From this sort of study we cannot directly estimate the risk of cancer for smokers! We have a group of people with cancer and we have estimated, from this sample, the proportion of people with cancer who smoke. And, hopefully, we have a comparable group of people without cancer. We have estimated, from this second sample, the proportion of people without cancer who smoke. All we can estimate directly is the risk of being a smoker, if a person does or does not have lung cancer – not a very interesting calculation. We are really interested in what proportion of smokers and non-smokers get cancer.
As an aside, those who have seen Bayes theorem may know these probabilities can be flipped. However, in this case there is not enough information. For example, at the very least, to use Bayes theorem we would need to know the overall rate of either smoking or lung cancer in the population. Not only do we not know this, but we also cannot be sure what population, if any, is represented by our cases and controls.
The Task
Our goal, in this retrospective study, is to determine if there is a positive association between smoking and lung cancer, and to (through mathematical manipulations) estimate the risk of lung cancer for smokers relative to non-smokers.