Impact Investing Case Study: School Drop Out

Enter dummy values below to test this model out.



About this algo:
- I created this model using a Gradient Boost Classifier on a dataset of 76500 student observations to predict drop out rates. Accuracy score: 63.4%
- I also tested this using models like Logistic Regression Classifier (accuracy: <60% ), Decision Tree Classifier (accuracy: <60%), K Nearest Neighbour (accuracy score: <60% ), Support Vector machines (accuracy: <60%), Gradient Boost Classifier (accuracy: <60%)
- This relatively complex model can be used when linear relationships are not clearly visible and decision trees are struggling to give pure direct answers. We thus use a learning rate to fix the accuracy.
- The Gradient Boost algorithm applies a new model (say decision Trees) each time and corrects the errors of the previous one by focusing on the data points with the highest errors. Using decision trees it creates a strong predictive model through optimization of a loss function.(A loss function measures how far a model's predictions are from the actual values)

Use cases of the underlying principle:
- Predict customer lifetime value (CLV), Predict player performance or game outcomes like in the movie Moneyball, Predict healthcare stats and analytics like on your Fitbit or Apple watch to recommend workouts and meal plans, Predict which ads users are likely to click in digital marketing, Medical Diagnosis: Predict diseases (e.g., heart-risk, cancer, the occurrence of the next pandemic? :o ), Predict student performance and suggest interventions (something e-learning companies like Coursera, Skillshare, Udemy are doing), Regression tasks like House Price Prediction: Predict real estate prices, Energy Demand Forecasting: Predict electricity usage (like what Ecobee does for home energy management), Suggest products/content by classifying user preferences, Classify land cover or predict air pollution levels etc.

Dataset: What does the underlying data look like?:
- The dataset was an excel sheet of 76,500 values of the same parameters you see above in the from.
- Data Training: 80% of that data was used to train the model and 20% was used to test it which gave you the accuracy score you see above.