I occasionally notice some spam accounts being created on my Mastodon instance. If they haven't posted to the timeline then they generally aren't spotted/reported and the only way I can see them is to manually review new accounts when they sign up. On those rare days that there is a massive spike in signups (we had 11k sign up to https://glasgow.social over a few days in November) it's just not feasible to manually review. I made this script to let me review the accounts after the fact (this also helps catch those spammers who create the account, wait a few days, then modify them).
I'm still working out how best to identify a spammer. At the moment, I'm just looking at the custom fields (called 'attachment' in Mastodon) and counting the URLs there. If there are four URLs then it's often spam.
First, I get a list of all the local users by connecting to my postgres database:
copy (SELECT username,suspended_at FROM accounts WHERE DOMAIN IS NULL) TO '/tmp/users.csv' WITH delimiter ',';
Then, I run this code to generate a score: '0' for no URLs, '4' for four URLs found.
php scan_for_spammers.php > output.csv
Then I can sort the output and look for only accounts with spam and suspend them.
sort -r output.csv | head
Which outputs something like:
4 xoilac33 4 work 4 waterproofepoxy 4 w88malayu20 4 w88indi18 4 vandanamanturgekar 4 urvam 4 urbanloveulcer 4 underfillepoxy 4 traigavietnet
I can then search for these in the moderation interface and review them.
The php code to generate the scores (remember to create a cache directory with mkdir cache
):
<?php $filename = "users.csv"; $cachedir = "cache/"; $mastodon_host = "https://glasgow.social"; $data = file_get_contents($filename); $lines = explode("\n", $data); $total = count($lines); $progress = 0; foreach($lines as $line) { $fields = explode(",", $line); if(!is_array($fields)) continue; $username = $fields[0]; $suspended_at = $fields[1]; if(empty($suspended_at)) { $cache_file = $cachedir."/".$username; $content_size = 0; if(file_exists($cache_file)) { $json_content = file_get_contents($cache_file); } else { $json_content = file_get_contents($mastodon_host."/users/".$username.".json"); $content_size = strlen($json_content); file_put_contents($cache_file, $json_content); } $json = json_decode($json_content, true); $attachment = $json['attachment']; $score = 0; foreach($attachment as $key=>$values) { if(strpos($values['value'], "http") !== false) $score++; } $percent_complete = number_format(($progress/$total)*100,1); $moderation_link = "<a href='$mastodon_host/admin/accounts?origin=local&username=".$username."'>mod link</a><br />"; echo $score."\t$username\t$moderation_link\n"; // this outputs a progress indicator to stderr // reporting content size of the json file in case I run into any rate limit issues fwrite(STDERR, "Downloaded ".number_format($content_size,0)." bytes... ($percent_complete%)\n"); } else { // do nothing, user already suspended } $progress++; } ?>
I added a moderation link to the CSV output so I can just open that file in a browser with this for example:
php scan_for_spammers.php | grep -E "^4" > output.html
To answer a question on Mastodon; You could add a list of spam keywords or suspicious urls at the top of the file, for example:
$spam_keywords = array('spam_term', 'spamwebsite.com');
Then add a loop just after the foreach($attachment..
to search the profile text for a url or keyword, for example, adding this would increase the score generated based on more keywords matching:
foreach($spam_keywords as $keyword) { if(preg_match("/$keyword/i", $json['summary'])) $score++; }
Back to the Mastodon page.