CASE STUDY: JMP027
Titanic Passengers
by Marlene Smith, University of Colorado Denver Business School
Key Concepts: Logistic regression, log odds and logit, odds, odds ratios, prediction profiler
Authors
Dr. Marlene Smith
University of Colorado Denver
Objective
Use the passenger data related to the sinking of the RMS Titanic ship to explore some questions of interest about survival rates for the Titanic. For example, were there some key characteristics of the survivors? Were some passenger groups more likely to survive than others? Can we accurately predict survival?
Background
The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1,502 of the 2,224 passengers and crew. This sensational tragedy shocked the international community and motivated the adoption of better maritime safety regulations.
One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others. (“Titanic: Machine Learning from Disaster.” From a Kaggle competition. Available at http://bit.ly/1f2crzi, data accessed 08/2014.)
The Task
We use this rich and storied example to explore some questions of interest about Titanic survival rates. For example, were there any key characteristics shared by survivors? Were some passenger groups more likely to survive than others? Can we accurately predict survival?
We will fit a logistic regression model using the available data to explore these questions