About this algo:
- I created this algo using a Count Vectorizer to convert words in articles into a numeric matrix of word frequencies.
- I then used a Naive Bayes algorithm to assign a Probability score of your entered text v/s each category.
- The category with the highest assigned probability wins! (And that's the result you see).
Use cases of the underlying principle:
- Twitter (X) tweets sentiment analysis: Positive,รง Negative, Neutral!
- Election Recommender: Voter sentiment towards a party basis sentiment analysis of multiple news articles in a region.
- HR Candidate Recommender system: Company Job Descriptions v/s Candidate resumes - Good fit v/s Not a good fit.
- Intent Classification: In chatbot systems, classifying user queries into specific intents (e.g., "book a flight", "check weather").
- Document Categorization: Classifying legal or financial documents into categories like contracts, invoices, reports, etc.
- Product Categorization: Categorizing product descriptions in e-commerce platforms (e.g., "electronics", "clothing", "home goods").
- Author Identification: Determining the author of a document based on writing style or specific word choices.
Dataset: What does the underlying data look like?:
-This dataset contains article titles, article mini-descriptions and labels, classifying it into categories. This is then primarily used as training data.
- Data Training: 80% of that data was used to train the model and 20% was used to test it which gave you the accuracy score you see above.