Diabetes Prediction (Algo: Random Forest Classifier)
Feel free to enter dummy values/educated guesses to test this model out.
Pregnancies
(range:0-10)
Glucose (mmol/L)
Normal= <5.9 mmol/L
Blood Pressure - Systolic
Normal = <130 mmHg
Age
(range:0-100)
Skin Thickness (mm)
(Normal= <20mm)
Insulin (pmol/L)
Normal = <90 (pmol/L)
BMI (kg/m²)
Normal = <24.9 kg/m²
Diabetes Pedigree Function
Normal =<1
Back to Home
Github Link to Source Code
Check Diabetes
About this algo:
- I created this model using a Random Forest Classifier on a dataset of 2,500 patient observations to predict diabetes outcomes. Accuracy score: 84.41%
- I also tested this using models like Logistic Regression Classifier (accuracy: 81.81%), Decision Tree Classifier (accuracy: 75.97%), K Nearest Neighbour (accuracy score: 77.92%), Support Vector machines (accuracy: 75.32%), Gradient Boost Classifier (accuracy: 74.67%)
- The Random Forest algorithm creates multiple decision trees from subsets of the data, where each tree makes its own prediction. The final prediction is made based on the majority vote of these trees.
- FYI: A Decision Tree algorithm works by splitting the dataset into branches based on feature values, such as age, glucose levels, and BMI. Each branch represents a decision, and the tree continues splitting until it reaches a prediction.
Use cases of the underlying principle:
- The Random Forest Classifier just like other classification algorithms (Logistic Regression, Decision Trees, XGBoost, Support Vector Machines, KMeans Cluster) could be used to:
- Predict customer lifetime value (CLV), Predict player performance or game outcomes like in the movie Moneyball, Predict healthcare stats and analytics like on your Fitbit or Apple watch to recommend workouts and meal plans, Predict which ads users are likely to click in digital marketing, Medical Diagnosis: Predict diseases (e.g., heart-risk, cancer, the occurrence of the next pandemic? :o ), Predict student performance and suggest interventions (something e-learning companies like Coursera, Skillshare, Udemy are doing), Regression tasks like House Price Prediction: Predict real estate prices, Energy Demand Forecasting: Predict electricity usage (like what Ecobee does for home energy management), Suggest products/content by classifying user preferences, Classify land cover or predict air pollution levels etc.
Dataset: What does the underlying data look like?:
- The dataset was an excel sheet of values of the same parameters you see above in the from.
- Data Training: 80% of that data was used to train the model and 20% was used to test it which gave you the accuracy score you see above.