Consider This Dataset Which Includes Information About Pass

Question

Consider this dataset which includes information about pass Consider this dataset which includes information about passengers of the Titanic. Create a Jupyter notebook file that contains the following: Python code to clean the data, remove any missing values, then find the mean, median, mode, standard deviation, and variance of each numerical column in the dataset. As the dataset doesn’t contain the weight of adult passengers who were on the ship, and given the fact that the average weight of adults between ages 20 and 50 is 90kg (with a 50kg variance), write Python code to generate a number of weights equal to the number of records in the dataset using normal distribution that simulates the actual population. Find the probability of having someone of a weight less than 50kg. Find the probability of having someone of a weight between 100kg and 120kg. Find the probability of having someone of a weight that’s exactly 77.7kg. Important Notes: A description of the data is available at Kaggle.com. Submit only one .ipynb file that includes all the code and be sure not to submit any other file format. Be sure to include a clear explanation before each step you perform in a markdown cell in the file. Be sure to include your name, the date, your class section, and the name of your program at the top of your file in the first cell of the file (markdown cell). Be sure to add a table of contents at the second cell in the file.

Dr. Jack HW Helper · Accepted Answer

### Introduction This Jupyter Notebook aims to provide an extensive analysis of the Titanic passenger dataset, focusing on data cleaning, statistical analysis, and the application of normal distribution to estimate the weights of adult passengers onboard. The notebook will consist of Python code snippets, alongside explanatory markdown cells to ensure clarity and enhance understanding. ### Step 1: Initial Setup Before diving into the analysis, we will import the necessary libraries. This includes pandas for data manipulation, numpy for numerical operations, and scipy for statistical functions. import pandas as pd import numpy as np from scipy import stats import matplotlib.pyplot as plt ### Step 2: Data Loading Using Pandas, we will load the Titanic dataset from a CSV file. For illustration purposes, let's assume the dataset is named "titanic.csv". data = pd.read_csv('titanic.csv') data.head() # Displaying the first five records of the dataset ### Step 3: Data Cleaning Next, we will clean the dataset by removing any missing values. This is crucial to ensure that our statistical calculations are accurate. data_cleaned = data.dropna() # Removing rows with missing values ### Step 4: Statistical Analysis We will compute various statistical metrics for each numerical column in the dataset, including mean, median, mode, standard deviation, and variance. statistics_summary = data_cleaned.describe() # Getting a summary of statistics statistics_summary.loc['mode'] = data_cleaned.mode().iloc[0] # Adding mode to the summary statistics_summary ### Step 5: Weight Generation Given the average weight of adults between 20 and 50 years of age is 90kg with a variance of 50kg, we will generate synthetic weight data using the normal distribution. average_weight = 90 # kg weight_variance = 50 # kg num_records = len(data_cleaned) Generating weights using normal distribution weights = np.random.normal(average_weight, np.sqrt(weight_variance), num_records) ### Step 6: Probability Calculati

Consider This Dataset Which Includes Information About Pass ✓ Solved

Consider this dataset which includes information about pass

Paper For Above Instructions

Generating weights using normal distribution

Consider this dataset which includes information about pass

Paper For Above Instructions

Generating weights using normal distribution

Related Assignments