User Tools

Site Tools


reddit_hashtags

This is an old revision of the document!


Reddit Hashtags

This link https://projects.glasgow.social/rglasgow/ will let you enter a reddit username and it'll return a list of interests based on keywords in their post and comment history (if available). There is also a page that returns more keywords and tries to assign them a weight: https://starflyer.armchairscientist.co.uk/data/reddit/scan.php

It's far from perfect, but it does a few things to try and remove irrelevant words:

  • Tokenisation - seperates the words by whitespace and removes duplicates
  • Normalisation - ignores tokens that are too short (less than five characters) and too long (more than 12 characters - those are usually URLs etc)
  • Stop word removal - Remove common stop words, based on this list of stop words.
  • Lemmatisation - reduces (non-noun) words to their root origins
reddit_hashtags.1627488422.txt.gz · Last modified: 2021/07/28 17:07 by admin