====== Mastodon SEO Spam ====== {{ :pasted:20230209-031500.png?200|An example of the type of accounts this script finds}} I occasionally notice some spam accounts being created on my Mastodon instance. If they haven't posted to the timeline then they generally aren't spotted/reported and the only way I can see them is to manually review new accounts when they sign up. On those rare days that there is a massive spike in signups (we had 11k sign up to https://glasgow.social over a few days in November) it's just not feasible to manually review. I made this script to let me review the accounts after the fact (this also helps catch those spammers who create the account, wait a few days, then modify them). I'm still working out how best to identify a spammer. At the moment, I'm just looking at the custom fields (called 'attachment' in Mastodon) and counting the URLs there. If there are four URLs then it's often spam. First, I get a list of all the local users by connecting to my postgres database: copy (select username,suspended_at from accounts where domain is null) to '/tmp/users.csv' with delimiter ','; Then, I run this code to generate a score: '0' for no URLs, '4' for four URLs found. php scan_for_spammers.php > output.csv Then I can sort the output and look for only accounts with spam and suspend them. sort -r output.csv | head Which outputs something like: 4 xoilac33 4 work 4 waterproofepoxy 4 w88malayu20 4 w88indi18 4 vandanamanturgekar 4 urvam 4 urbanloveulcer 4 underfillepoxy 4 traigavietnet I can then search for these in the moderation interface and review them. The php code to generate the scores (remember to create a cache directory with ''mkdir cache''): $values) { if(strpos($values['value'], "http") !== false) $score++; } $percent_complete = number_format(($progress/$total)*100,1); $moderation_link = "mod link
"; echo $score."\t$username\t$moderation_link\n"; // this outputs a progress indicator to stderr // reporting content size of the json file in case I run into any rate limit issues fwrite(STDERR, "Downloaded ".number_format($content_size,0)." bytes... ($percent_complete%)\n"); } else { // do nothing, user already suspended } $progress++; } ?>
I added a moderation link to the CSV output so I can just open that file in a browser with this for example: php scan_for_spammers.php | grep -E "^4" > output.html To answer a question on Mastodon; You could add a list of spam keywords or suspicious urls at the top of the file, for example: $spam_keywords = array('spam_term', 'spamwebsite.com'); Then add a loop just after the ''foreach($attachment..'' to search the profile text for a url or keyword, for example, adding this would increase the score generated based on more keywords matching: foreach($spam_keywords as $keyword) { if(preg_match("/$keyword/i", $json['summary'])) $score++; } Back to the [[Mastodon]] page.