About this algo:
- I built this model with a 'Support Vector Machine Classifier' algo on a dataset of 2.3k customer superstore purchases with an accuracy of 86.83%
- I also tested this using models like Logistic Regression Classifier (accuracy: 88.16%),
Decision Tree Classifier (accuracy: 86.38%), Random Forest (accuracy score: 88.8%),
K Nearest Neighbour (accuracy score: 83.92%), Support Vector machines (accuracy: 86.83%),
Gradient Boost Classifier (accuracy: 87.94%)
- Recursive Feature Elimination (RFE): In order to improve the accuracy of the model I used Recursive Feature Elimination (RFE) to identify the most important features/predictors. This happens by testing the model by recursively eliminating one predictor after another to get the best predictors
- While in a model like SVC it is sometimes difficult to predict which predictors had more influence than the others the above combination of 11 predictors gave the highest accuracy score
Use cases of the underlying principle:
- The Random Forest Classifier just like other classification algorithms (Logistic Regression, Decision Trees, XGBoost, Support Vector Machines, KMeans Cluster) could be used to:
- Drug Effectiveness: Analyze which treatments work best for specific groups (pharma companies would be heavily reliant on this!),
Credit Card Usage: Analyze spending patterns and use this data to crack partnership deals eg: AMEX and AirMiles,
Predictive Maintenance: Forecast machinery failure before breakdowns occur (something that Honda mastered),
Predict claims likelihood and calculate premiums accordingly Crop Yield Prediction: Analyze environmental
factors and forecast yields - Even impact investing firms and farmer lending companies can benefit from this,
Predict player/viewer churn and optimize in-game offers (e-gaming AND IPL/EPL/NFL/NBA), Gaming: Predict whether a player
will continue playing or churn., Predict likelihood of in-app purchases and optimize offers, Predict top-performing
players in competitive games, Energy Management: Solar Energy Prediction - Estimate energy output from solar farms
based on weather conditions, Game Strategy Recommendations: Analyze past matches to suggest optimal strategies,
Education: Dropout Prediction: Identify students at risk of dropping out based on academic performance and
socio-demographic factors.
Dataset: What does the underlying data look like?:
- The dataset was an excel sheet of values of the same parameters you see above in the from.
- Data Training: 80% of that data was used to train the model and 20% was used to test it which gave you the accuracy score you see above.
model and 20% was used to test it which gave you the accuracy score you see above.