====== Mastodon SEO Spam ======
{{ :pasted:20230209-031500.png?200|An example of the type of accounts this script finds}}
I occasionally notice some spam accounts being created on my Mastodon instance. If they haven't posted to the timeline then they generally aren't spotted/reported and the only way I can see them is to manually review new accounts when they sign up. On those rare days that there is a massive spike in signups (we had 11k sign up to https://glasgow.social over a few days in November) it's just not feasible to manually review. I made this script to let me review the accounts after the fact (this also helps catch those spammers who create the account, wait a few days, then modify them).
I'm still working out how best to identify a spammer. At the moment, I'm just looking at the custom fields (called 'attachment' in Mastodon) and counting the URLs there. If there are four URLs then it's often spam.
First, I get a list of all the local users by connecting to my postgres database:
copy (select username,suspended_at from accounts where domain is null) to '/tmp/users.csv' with delimiter ',';
Then, I run this code to generate a score: '0' for no URLs, '4' for four URLs found.
php scan_for_spammers.php > output.csv
Then I can sort the output and look for only accounts with spam and suspend them.
sort -r output.csv | head
Which outputs something like:
4 xoilac33
4 work
4 waterproofepoxy
4 w88malayu20
4 w88indi18
4 vandanamanturgekar
4 urvam
4 urbanloveulcer
4 underfillepoxy
4 traigavietnet
I can then search for these in the moderation interface and review them.
The php code to generate the scores (remember to create a cache directory with ''mkdir cache''):
$values) {
if(strpos($values['value'], "http") !== false)
$score++;
}
$percent_complete = number_format(($progress/$total)*100,1);
$moderation_link = "mod link
";
echo $score."\t$username\t$moderation_link\n";
// this outputs a progress indicator to stderr
// reporting content size of the json file in case I run into any rate limit issues
fwrite(STDERR, "Downloaded ".number_format($content_size,0)." bytes... ($percent_complete%)\n");
} else {
// do nothing, user already suspended
}
$progress++;
}
?>
I added a moderation link to the CSV output so I can just open that file in a browser with this for example:
php scan_for_spammers.php | grep -E "^4" > output.html
To answer a question on Mastodon; You could add a list of spam keywords or suspicious urls at the top of the file, for example:
$spam_keywords = array('spam_term', 'spamwebsite.com');
Then add a loop just after the ''foreach($attachment..'' to search the profile text for a url or keyword, for example, adding this would increase the score generated based on more keywords matching:
foreach($spam_keywords as $keyword) {
if(preg_match("/$keyword/i", $json['summary']))
$score++;
}
Back to the [[Mastodon]] page.