Automating PHP Forms – Spam Filtering and Data Cleansing


Another great way to handle spam is to validate your data… which helps to protect against SUS (silly user syndrome) as well. Yes I just made that up. The problem with this is the amount of effort it takes to code spam filters like that for all the forms on your website. We can make this process a lot faster though by automating a way to block spam data with regular expressions on repeating conditions.

Create a form in html as you normally would and make sure you can call the following function:

  1. //USAGE: $results .=(Continuous for multiple fields) formcheck($postdata,"Field",length(number),"A or AN, ALL, EMAIL, WEBSITE, etc.","REQUIRE, NA");
  2. //EXAMPLE: $results .= formcheck($postdata,"Name",30,"AN","REQUIRE"); Then check $results for errors. If == "" then process form.
  3. function formcheck($data,$field,$length,$type,$require) {
  4. if ($data == "" && $require == "REQUIRE") {
  5. $error .= "<li class=\"errors\"><b>" . $field . "</b> Cannot be left blank</li>";
  6. return $error;
  7. } elseif ($data == "" && $require =="NA") {
  8. return;
  9. }
  11. if (strlen($data) > $length) $error .= "<li class=\"errors\"><b>" . $field . "</b> is too long. Please use <b>" . $length . "</b> characters or less.</li>";
  13. switch($type) {
  14. case "AN": if (!preg_match("/^[a-z0-9 ]+$/i", $data)) $error .= "<li class=\"errors\"><b>" . $field . "</b> can only contain letters and numbers and spaces</li>"; break;
  15. case "A": if (!preg_match("/^[a-z]+$/i", $data)) $error .= "<li class=\"errors\"><b>" . $field . "</b> can only contain letters and spaces</li>"; break;
  16. case "N": if (!preg_match("/^[0-9]+$/", $data)) $error .= "<li class=\"errors\"><b>" . $field . "</b> can only contain numbers</li>"; break;
  17. case "ALL": if(!preg_match("/^[a-z0-9\:\-\_\?\&\;\#\.\"\/\,\'\\\ ]+$/i", $data)) $error .= "<li class=\"errors\"><b>" . $field . "</b> contains invalid characters</li>"; break; //htmlspecialchars($data); better to do this for general data
  18. case "DATE": if(preg_match("/^[0-9]{2}-[0-9]{2}-[0-9]{4}$/", $data) === 0) $error .= "<li class=\"errors\"><b>" . $field . "</b> must be in the format MM-DD-YYYY</li>"; break;
  19. case "PASS": if(!preg_match("/^.*(?=.{6,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/", $data)) $error .= "<li class=\"errors\"><b>" . $field . "</b> must be at least 6 characters with at least one lower case letter, one upper case letter & one digit.</li>";
  20. case "WEBSITE": if (!preg_match("/^[a-z0-9\.\/\-:]+$/i", $data)) $error .= "<li class=\"errors\"><b>" . $field . "</b> entered is invalid.</li>"; break;
  21. case "EMAIL": if (!eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$", $data)) $error .= "<li class=\"errors\">The <b>" . $field . "</b> address entered is invalid.</li>"; break;
  22. }
  24. return $error;
  25. }

Call it up by using something like this. You can try filling in some dummy data for $postdata and testing it out: formcheck($postdata,”Name”,30,”AN”,”REQUIRE”)

  • $postdata – This should take the form of $_POST['name']. Always grab your variables from $_POST so you avoid variable tampering/resetting in-between. You can also do a extract($_POST) if you’re that gung-ho about using the key as your variable. In this case that would be $name.
  • “Name” – This is just an identifier for each call there is human readable version of what that function call was checking. It will be used only if the user encounters an error.
  • 30 – This can be any number and specifies maximum length. Data can pass through this function OK if its between 1-30 characters long.
  • “AN” – AN is the case you’re in. You can look in the function for further details of what the regular expression does in each case. AN is meant to use the AlphaNumeric checker. Anotherwords only letters, numbers, and spaces (which you can change manually if you wish).
  • “REQUIRED” – If you set this to “NA” it will allow the field to pass with no data entered in it. However if it is required it will check against the other parameters.

You may want to do some additional research on regular expressions if you want to add or edit functionality here. But this should cover the ground work so you can focus on your unique data validation scenarios.

I tend to make my forms 1 page and have “checks”. In following that structure I can stop and report out different information inside IF statements to see if submit was pressed, if the data is good, and what to do with that data. It also helps retain information from the previous button press so I can allow the user to see their information again and possibly change it. Of course it would be a good idea to run the formcheck function on the data before outputting it back into the input value to make sure the user is not inputting naughty XML or script data. This should be done anytime a user submits data that is displayed back to them.

Either use: htmlspecialchars($data) and mysql_real_escape_string($data) or something like this for the input value:

if(isset($email) && formcheck($_POST['email'],”E-Mail”,50,”EMAIL”,”REQUIRE”) == “”) { echo $email; }

Same function as you’d use below, we’re just utilizing the function again. The only downside is it gets rid of the users input completely if it matched illegal output. This isn’t too hard to add in which I might do at a later time pending any requests.

  1. //Heres my quick usage example
  2. if (isset($_POST['submit'])) {
  3. $errorcatcher .= formcheck($postdata,"Name",30,"A","REQUIRE");
  4. $errorcatcher .= formcheck($postdata,"Email",50,"EMAIL","REQUIRE");
  5. $errorcatcher .= formcheck($postdata,"Website",100,"WEBSITE","NA");
  6. if ($errorcatcher == "") { //if empty then no errors
  7. //SUCCESS
  8. } else { //at least one error was found. lets print them.
  9. echo $errorcatcher;
  10. }
  11. }
  12. if (isset($errorcatcher) || $_POST['submit'] == "") {
  13. //Display the form
  14. }

A good practice to follow is to display neat and tidy forms. Use CSS, tables and label tags. Check out the forms under a content page at mmorpgexposed.com. Not to toot my own horn, but, it is presented in an understandable, organized, and professional fashion. Note that I provided extra text which specify to users what kind of input I’m looking for. IE: “3-15 Alphanumeric”. Follow through and validate on that. There’s no easier way to increase your bounce rate than to have sign up forms that are confusing to use (or ones that you need to fill out multiple times because you don’t understand why your input isn’t working).

If you’re feeling ambitious, run query that logs a users IP to a table and increases a count column to see how many attempts it takes for people to fill out your form now. Leave it for a week and repeat the test with these changes and see if there’s a major difference in times it throws errors. My guess would be the larger the difference in errors the more of a difference in user signups you’ll see.

Respond: Leave A Comment | Trackback URL

Entrupeners, Subscribe for the lastest tools, tips, and tutorials.

Leave a Reply

Custom Theme by Rob Malon | Content & Design © 2010 - RobMalon.Com - Chicago, Illinois