====== Reddit Hashtags ====== |https://projects.glasgow.social/rglasgow/|Lets you enter a reddit username and it'll return a list of interests based on keywords in their post and comment history (if available)| |https://starflyer.armchairscientist.co.uk/data/reddit/scan.php|Returns more keywords and tries to assign them a weight| It's far from perfect, but it does a few things to try and remove irrelevant words: * Tokenisation - seperates the words by whitespace and removes duplicates * Normalisation - ignores tokens that are too short (less than five characters) and too long (more than 12 characters - those are usually URLs etc) * Stop word removal - Remove common [[wp>Stop_word|stop words]], based on this [[https://www.ranks.nl/stopwords|list of stop words]]. * [[wp>Lemmatisation|Lemmatisation]] - reduces (non-noun) words to their root origins