User Tools

Site Tools


mastodon_spam_scanner

This is an old revision of the document!


Mastodon SEO Spam

I occasionally notice some spam accounts being created on my Mastodon instance. If they haven't posted to the timeline then they aren't reported and the only way I can spot them is to manually review new accounts when they sign up. On those rare days that there is a massive spike in signups (we had 11k sign up to https://glasgow.social over a few days in November) it's just not feasible to manually review. I made this script to let me review the accounts after the fact (this also helps catch those spammers who create the account, wait a few days, then modify them).

I'm still working out how best to identify them. At the moment, I'm looking at the custom fields (called 'attachment' in Mastodon) and counting the URLs there. If there are four URLs then it's often spam.

First, I get a list of all the local users by connecting to my postgres database:

copy (SELECT username,suspended_at FROM accounts WHERE DOMAIN IS NULL) TO 'users.csv';

Then, I run this code to generate a score: '0' for no URLs, '4' for four URLs found.

php scan_for_spammers.php > output.csv

Then I can sort the output and look for only accounts with spam and suspend them.

sort -r output.csv | head

Which outputs something like:

4       xoilac33
4       work
4       waterproofepoxy
4       w88malayu20
4       w88indi18
4       vandanamanturgekar
4       urvam
4       urbanloveulcer
4       underfillepoxy
4       traigavietnet

I can then search for these in the moderation interface and review them.

The php code to generate the scores:

scan_for_spammers.php
<?php
$filename = "users.csv";
$cachedir = "cache/";
$mastodon_host = "https://glasgow.social/users/";
 
$data = file_get_contents($filename);
 
$lines = explode("\n", $data);
$total = count($lines);
$progress = 0;
 
foreach($lines as $line) {
   $fields = explode(",", $line);
 
   $username = $fields[0];
   $suspended_at = $fields[1];
 
   if(empty($suspended_at)) {
      $cache_file = $cachedir."/".$username;
      $content_size = 0;
      if(file_exists($cache_file)) {
         $json_content = file_get_contents($cache_file);
      } else {
         $json_content = file_get_contents($mastodon_host."/".$username.".json");
         $content_size = strlen($json_content);
         file_put_contents($cache_file, $json_content);
      }
      $json = json_decode($json_content, true);
      $attachment = $json['attachment'];
      $score = 0;
      foreach($attachment as $key=>$values) {
         if(strpos($values['value'], "http") !== false)
            $score++;
      }
      $percent_complete = number_format(($progress/$total)*100,1);
      echo $score."\t$username\n";
      // this outputs a progress indicator to stderr
      // reporting content size of the json file in case I run into any rate limit issues
      fwrite(STDERR, "Downloaded ".number_format($content_size,0)." bytes... ($percent_complete%)\n");
   } else {
      // do nothing, user already suspended
   }
   $progress++;
}
 
?>
mastodon_spam_scanner.1675911993.txt.gz · Last modified: 2023/02/09 03:06 by admin