User Tools

Site Tools


reddit_hashtags

This is an old revision of the document!


Reddit Hashtags

This link https://projects.glasgow.social/rglasgow/ will let you enter a reddit username and it'll return a list of interests based on keywords in their post and comment history (if available).

It does a few things to try and remove irrelevant words:

  • Tokenisation - seperates the words by whitespace and removes duplicates
  • Normalisation - ignores words that are too short (less than five characters) and too long (more than 12 characters - those are usually URLs etc)
  • Stop word removal - Remove common stop words, based on this list of stop words.
  • Lemmatisation - reduces (non-noun) words to their root origins
reddit_hashtags.1627482628.txt.gz · Last modified: 2021/07/28 15:30 by admin