Titanic Survival Prediction

Challenge:

We want you to use the Titanic passenger data (name, age, ticket price, etc.) to forecast who will survive

and who will perish.

Overview:

In this article, we will examine the trained and test data, which includes passenger-specific information.

With the information, we will determine who will survive.

Environment and Libraries:

We have implemented the project using Python 3 environment. The libraries used are

 NumPy: Used for performing mathematical operations in arrays.

 Pandas: For analyzing and reading the data.

 Matplotlib: For visualization of data in the form of graphs.

 Sklearn: To implement Machine learning techniques.

 Seaborn: For visualization of data.

Data:

A total of 3 different datasets have been provided for us to use, namely train.csv, test.csv and

gender_submission.csv. The files can be accessed using the following links:

 /kaggle/input/titanic/train.csv

 /kaggle/input/titanic/test.csv

 /kaggle/input/titanic/gender_submission.csv

The first step was to load the data from the above links. For this we are using the pandas library and read.csv c

ommand.

Loading the train.csv file

Loading the test.csv file

train.csv – This dataset provides the passenger information for a subset of passengers. They are a total of 891 r

ows each comprising to a passenger.

test.csv – This dataset contains the data for the other 418 passengers for whom we have to predict the survival.

gender_submission.csv – This is a sample file which explains us how to structure our predictions. This dataset

assumes all the male passengers dead and all the female passengers survived.

Exploration of Patterns:

First, we are checking to what degree the assumption made in the gender_submission.csv file, where all fe

males survive and male passenger don’t survive.

We can see in the above code; we have calculated the survival rate of the female passengers which came

out to be at 74.2%.

The above code calculates the survival rate of male passengers present on the Titanic.

Contribution:

So first here we have imported matplotlib and seaborn libraries which we are going to use later. Before go

ing further and predicting the survival rate, first we need to preprocess the data. So here we are first remo

ving the unnecessary columns such as PassengerID, Name, Ticket and Cabin, and then we fill the empty c

ells with the median value.

Loading the data using pandas

Removing the unneeded columns from the data

Filling the null column values with the data.

As seen in the data above, the column Embarked has categorical values ‘S’, ‘C’ and ‘Q’. Using the Label

Encoder we are transforming the data into the form of ‘1’ and ‘0’ by forming three new columns namely

Embarked_C, Embarked_Q and Embarked_S. The ‘Sex’ column has been also transformed from the cate

gorical values to ‘1’s and ‘0’s.

We are plotting the heatmap to see the relation between different columns.

In the next step, we are training the data to predict the survival of the passengers. So first we need to remo

ve the survival column from the data in order to train the data properly.

After this we are now reading and transforming the data present in the test.csv file. Here we are doing the

same steps which we have performed for the train.csv data.

After preprocessing the test data, we need to predict the survivability of each passenger. This is done by u

sing the XGBClassifier. XGBClassifier is one of the machine learning algorithms used for prediction. Thi

s can be applied to tabular and structured data.

Finally, we are predicting and saving the survival data and saving the predicted data in ‘submission_xgb.c

sv’ file.

Final Output:

The above is the final output obtained, submission_xgb.csv

References:

https://ithelp.ithome.com.tw/articles/10257683

https://stackoverflow.com/questions/60765425/how-to-install-latest-version-of-tensorflow-on-kaggle

https://notebook.community/minesh1291/Practicing-Kaggle/MNIST_2017/dump_/toxic_notebook