User Tools

Site Tools


reddit_hashtags

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
reddit_hashtags [2021/07/28 15:24]
admin created
reddit_hashtags [2021/07/28 17:07] (current)
admin
Line 1: Line 1:
 ====== Reddit Hashtags ====== ====== Reddit Hashtags ======
-This link https://​projects.glasgow.social/​rglasgow/ ​will let you enter a reddit username and it'll return a list of interests based on keywords in their post and comment history (if available).+|https://​projects.glasgow.social/​rglasgow/​|Lets you enter a reddit username and it'll return a list of interests based on keywords in their post and comment history (if available)
 +|https://​starflyer.armchairscientist.co.uk/​data/​reddit/​scan.php|Returns more keywords and tries to assign them a weight|
  
-It does a few things to try and remove irrelevant words:+It's far from perfect, but it does a few things to try and remove irrelevant words:
   * Tokenisation - seperates the words by whitespace and removes duplicates   * Tokenisation - seperates the words by whitespace and removes duplicates
 +  * Normalisation - ignores tokens that are too short (less than five characters) and too long (more than 12 characters - those are usually URLs etc)
   * Stop word removal - Remove common [[wp>​Stop_word|stop words]], based on this [[https://​www.ranks.nl/​stopwords|list of stop words]].   * Stop word removal - Remove common [[wp>​Stop_word|stop words]], based on this [[https://​www.ranks.nl/​stopwords|list of stop words]].
   * [[wp>​Lemmatisation|Lemmatisation]] - reduces (non-noun) words to their root origins   * [[wp>​Lemmatisation|Lemmatisation]] - reduces (non-noun) words to their root origins
reddit_hashtags.1627482241.txt.gz · Last modified: 2021/07/28 15:24 by admin