This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
reddit_hashtags [2021/07/28 15:30] admin |
reddit_hashtags [2025/03/19 00:48] (current) admin |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Reddit Hashtags ====== | ====== Reddit Hashtags ====== | ||
- | This link https://projects.glasgow.social/rglasgow/ will let you enter a reddit username and it'll return a list of interests based on keywords in their post and comment history (if available). | + | **Note: These projects are defunct circa Dec 2023 (when Reddit clamped down on their public API to retrieve comments etc - it was massively rate-limited). So the project was abandoned (including the [[Matrix firehose]] room).** |
- | It does a few things to try and remove irrelevant words: | + | |https://projects.glasgow.social/rglasgow/|Lets you enter a reddit username and it'll return a list of interests based on keywords in their post and comment history (if available)| |
+ | |https://starflyer.armchairscientist.co.uk/data/reddit/scan.php|Returns more keywords and tries to assign them a weight| | ||
+ | |||
+ | It's far from perfect, but it does a few things to try and remove irrelevant words: | ||
* Tokenisation - seperates the words by whitespace and removes duplicates | * Tokenisation - seperates the words by whitespace and removes duplicates | ||
- | * Normalisation - ignores words that are too short (less than five characters) and too long (more than 12 characters - those are usually URLs etc) | + | * Normalisation - ignores tokens that are too short (less than five characters) and too long (more than 12 characters - those are usually URLs etc) |
* Stop word removal - Remove common [[wp>Stop_word|stop words]], based on this [[https://www.ranks.nl/stopwords|list of stop words]]. | * Stop word removal - Remove common [[wp>Stop_word|stop words]], based on this [[https://www.ranks.nl/stopwords|list of stop words]]. | ||
* [[wp>Lemmatisation|Lemmatisation]] - reduces (non-noun) words to their root origins | * [[wp>Lemmatisation|Lemmatisation]] - reduces (non-noun) words to their root origins |