User Tools

Site Tools


reddit_hashtags

Reddit Hashtags

https://projects.glasgow.social/rglasgow/Lets you enter a reddit username and it'll return a list of interests based on keywords in their post and comment history (if available)
https://starflyer.armchairscientist.co.uk/data/reddit/scan.phpReturns more keywords and tries to assign them a weight

It's far from perfect, but it does a few things to try and remove irrelevant words:

  • Tokenisation - seperates the words by whitespace and removes duplicates
  • Normalisation - ignores tokens that are too short (less than five characters) and too long (more than 12 characters - those are usually URLs etc)
  • Stop word removal - Remove common stop words, based on this list of stop words.
  • Lemmatisation - reduces (non-noun) words to their root origins
reddit_hashtags.txt · Last modified: 2021/07/28 17:07 by admin