Dynadot

PHP5 Basic Spam Class

Spaceship Spaceship
Watch
Impact
5,506
Hey guys,

This class contains a nice method for helping you determine if a string is spam or not. Here is what it does:

  • Length - Checks the string to make sure it isn't too short or too long.
  • Standard Text - Makes sure the string is standard text, which means only letters, numbers, whitespace, dashes, periods, question marks and exclamation marks.
  • Links - Makes sure the string doesn't contain too many anchor links or links.
  • Optional basic grammar check - It can check to make sure the first letter of the string is capitalized and make sure there aren't too many repetitive characters. This stops stupid messages like 'hiiiii' or 'you smellll!!!!!'.
  • Bad words - Make sure the string doesn't contain any bad words that you don't want!

Spam Class

PHP:
<?php
/**
 * Basic spam class for PHP5. Aids in determining whether a
 * string is spam using several factors.
 *
 * @package        Spam
 * @author         David Parr <[email protected]>
 * @copyright      Copyright (c) David Parr, 2008
 */

class Spam
{
    // Configuration
    protected $config;
    
    // Bad words that we don't want in the string
    protected $bad_words;
    
    /**
     * Constructor. Sets configuration.
     *
     * @param array Configuration
     * @param array Bad words like 'shit', 'faggot'
     * @return void
     */
    public function __construct($config, $bad_words)
    {
        $this->config = $config;
        $this->bad_words = $bad_words;
    }
    
    /**
     * Performs several tests on a string to help
     * determine whether or not it is spam.
     *
     * @param string String we are checking
     * @return bool
     */
    public function check($str)
    {
        // Check the length of the string isn't too short, or long..
        $length = strlen($str);
        if($length < $this->config['min_length'] OR $length > $this->config['max_length'])
        {
            return false;
        }
        
        // Check the string is standard text (only letters, numbers, whitespace, dashes and periods.
        if( ! preg_match('/^[-\pL\pN\pZ_.!?]++$/uD', $str))
        {
            return false;
        }
        
        // Count the number of anchor links found in the string.
        preg_match_all('#(<a href|\[url|http:\/\/)#i', $str, $matches, PREG_PATTERN_ORDER);
        if(count($matches[0]) > $this->config['max_links'])
        {
            return false;
        }
        
        // I always like to cleanup after myself :D
        unset($matches);
        
        // Grammar check?
        if($this->config['grammar_check'])
        {
            // First letter should always be capitalized
            if($str[0] > 'a')
            {
                return false;
            }
            
            // Shouldn't be no more than 2 repetitive characters in any word. Very few words have more.
            // This is ugly but it does work. If you know of any other way please let me know.
            // This is useful for stopping idiots entering things like "hiiiiii" or "you smeeeeelll!!!!" :o
            $words = explode(' ', $str);
            $found = false;
            foreach($words as $word)
            {
                // The length of the word should be greater than 1 if its not an a or a number
                // This prevents things like U and R
                if(strlen($word) < 2 AND strtolower($word) != 'a' AND ! is_numeric($word))
                {
                    $found = true;
                    break;
                }
                
                $chars = explode("''", $word);
                $chars_count = array();
                foreach($chars as $char)
                {
                    if($chars_count[$char]++ > 2)
                    {
                        $found = true;
                        break;
                    }
                }
            }
            
            unset($words);
            unset($chars);
            unset($chars_count);
            
            if($found)
            {
                return false;
            }
        }
        
        // Check for any bad words.
        foreach($this->bad_words as $bad_word)
        {
            if(stripos($str, $bad_word) !== FALSE)
            {
                return false;
            }
        }
        
        // If we got here then everything is fine ;)
        return true;
    }
}

Example

PHP:
<?php

require_once('classes/Spam.php');

$config = array(
    'min_length'    => 10,
    'max_length'    => 255,
    'max_links'     => 0,
    'grammar_check' => true
);

// Replace these with much worse words lol
$bad_words = array('bad', 'terrible', 'awful');

$spam = new Spam($config, $bad_words);

$str = 'Hi everybody how are you!?'; // Will pass the check

$result = $spam->check($str);

if($result)
{
    echo 'String contains no spam';
}
else
{
   echo 'String does contain spam';
}

// TESTS

$str = 'Hello'; // Is too short so would fail
$str = 'http://games.com'; // Fails because of link

// If we have grammar check on the following will fail
$str = 'hi everyone how are you?'; // First letter isn't capitalized.
$str = 'hiiiii'; // Repetitive
$str = 'Hey u r idiot'; // Fails because of u and r
?>

If you don't want grammar check then simply make sure the grammar check variable in the config is set to false.

Enjoy! :wave:
 
Last edited:
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back