====== Mastodon SEO Spam ====== {{ :pasted:20230209-031500.png?200|An example of the type of accounts this script finds}} I occasionally notice some spam accounts being created on my Mastodon instance. If they haven't posted to the timeline then they generally aren't spotted/reported and the only way I can see them is to manually review new accounts when they sign up. On those rare days that there is a massive spike in signups (we had 11k sign up to https://glasgow.social over a few days in November) it's just not feasible to manually review. I made this script to let me review the accounts after the fact (this also helps catch those spammers who create the account, wait a few days, then modify them). I'm still working out how best to identify a spammer. At the moment, I'm just looking at the custom fields (called 'attachment' in Mastodon) and counting the URLs there. If there are four URLs then it's often spam. First, I get a list of all the local users by connecting to my postgres database:


copy (select username,suspended_at from accounts where domain is null) to '/tmp/users.csv' with delimiter ',';

Then, I run this code to generate a score: '0' for no URLs, '4' for four URLs found.


php scan_for_spammers.php > output.csv

Then I can sort the output and look for only accounts with spam and suspend them.


sort -r output.csv | head

Which outputs something like:


4       xoilac33
4       work
4       waterproofepoxy
4       w88malayu20
4       w88indi18
4       vandanamanturgekar
4       urvam
4       urbanloveulcer
4       underfillepoxy
4       traigavietnet

I can then search for these in the moderation interface and review them. The php code to generate the scores (remember to create a cache directory with ''mkdir cache''):


$values) {
         if(strpos($values['value'], "http") !== false)
            $score++;
      }
      $percent_complete = number_format(($progress/$total)*100,1);
      $moderation_link = "mod link
";
      echo $score."\t$username\t$moderation_link\n";
      // this outputs a progress indicator to stderr
      // reporting content size of the json file in case I run into any rate limit issues
      fwrite(STDERR, "Downloaded ".number_format($content_size,0)." bytes... ($percent_complete%)\n");
   } else {
      // do nothing, user already suspended
   }
   $progress++;
}

?>

I added a moderation link to the CSV output so I can just open that file in a browser with this for example:


php scan_for_spammers.php | grep -E "^4" > output.html

To answer a question on Mastodon; You could add a list of spam keywords or suspicious urls at the top of the file, for example:


$spam_keywords = array('spam_term', 'spamwebsite.com');

Then add a loop just after the ''foreach($attachment..'' to search the profile text for a url or keyword, for example, adding this would increase the score generated based on more keywords matching:


      foreach($spam_keywords as $keyword) {
         if(preg_match("/$keyword/i", $json['summary']))
            $score++;
      }

Back to the [[Mastodon]] page.