About this algo:
I created this algo using an NLP tokenizer and a Term Frequency/Inverse Document Frequency model to convert song lyrics into a matrix of word frequencies. I then use a cosine similarity score to calculate similarities between a song and other such songs.
The songs with the highest cosine similarity wins! (And that's the result you see).
Use cases of the underlying principle:
- Grant matching: Government agencies and NGOs can use this principle to identify their agency offering v/s grants that startups can apply to.
- HR Recommender System: Matching the best candidates to an ideal job profile using descriptions of the two
- Create your own Search Engine: This is also used in matching search queries to results
Dataset: What does the underlying data look like?:
- The dataset consisted of 57,000 songs, its lyrics, its artists and its genres.
- The underlying text within it was used to check word/text frequency across other songs/lyrics. This was converted into a matrix, and a similarity correlation score was attached to it to give you the final result.
- Data Training: The whole dataset was used to give you an answer and there was no split between training and testing data as such.