top of page

Customer Churn Prediction with Random Forest

  • Writer: Aroyewun Airat
    Aroyewun Airat
  • Dec 16, 2024
  • 2 min read

Overview:

In this project, I used a Random Forest model to analyze customer churn, with a focus on identifying the factors that most influence churn rates. I was particularly interested in whether price sensitivity was the leading cause of churn. After thorough analysis and model building, the results revealed valuable insights that can be applied to real-world business decisions.


Key Steps in the Project:

  1. Business Understanding & Problem Framing: The goal was to predict whether a customer would churn based on historical client data. By understanding the business context, I was able to frame the problem and hypothesize that price sensitivity could be the primary factor behind churn.

  2. Exploratory Data Analysis: I conducted a detailed analysis of the dataset, addressing missing values, incorrect data types, and outliers. I also grouped the features into subsets based on their similarities to understand their distribution and relationships.

  3. Feature Engineering:

    • Data Transformation: Categorical data was transformed using the get_dummies method, and continuous variables were log-transformed to deal with skewed distributions.

    • Multicollinearity Check: I identified and removed highly correlated features to prevent overfitting in the model.

  4. Modeling & Evaluation:

    • I built a Random Forest Classifier and compared its performance using different datasets: the original data, undersampled data, and oversampled data.

    • The model's performance was evaluated using metrics such as accuracy, precision, and the confusion matrix.

  5. Insights & Findings:

    • Surprisingly, price sensitivity was not the leading factor for churn. Instead, factors like net margin and customer tenure played a significant role.

    • I also found that time-related features such as the number of months a customer has been active and the number of months since they updated their contract are important indicators of churn.


Conclusion: Price sensitivity, while important in many business contexts, did not appear to be the dominant factor in customer churn. Profitability metrics like net margin, combined with customer engagement and contract status, were much stronger predictors. This project reinforced the value of a data-driven approach to solving business problems.


If you'd like to learn more about how I approached this problem and the techniques used, feel free to explore the GitHub repository where I have shared the full project, including code and results.



 
 
 

Comments


  • Github
  • email
  • LinkedIn
bottom of page