Feature Engineering + H2o Gradient Boosting (GBM) in R Scores 0.936

With less than 3 days to go, this script is meant to help beginners with feisty ideas, machine learning workflow and motivation for ongoing machine learning challenge.

Here's a quick workflow of what I've done below:

  1. Load data and explore
  2. Data Pre-processing
  3. Dropped Features
  4. One Hot Encoding
  5. Feature Engineering
  6. Model Training

Good Luck!

Note: For more feature engineering ideas, spend time on exploring data by loan_status variable. For categorical vs categorical data, create dodged bar plots. For categorical vs continuous data, create density plots and use fill=as.factor(loan_status).

To help the community, feel free to contribute the equivalent python / C ++ script in the comments below.

Update: You can get python script for this solution from Jin Cong Ho's comment below.


Script (R)


Resources - Handy Algorithms for this Challenge

About the Author

Making an effort to help people understand Machine Learning. I believe your educational background doesn't stop you to pursue ML & Data Science. Earned Masters in F/M, a self taught data science professional. Previously worked at Analytics Vidhya. Now solving ML & Growth challenges at HackerEarth!