Dec
22
2008

Preg_Match Vs Stristr Versus SQL INSERT Efficiency

Share
Email

stristr() vs preg_match(). Which one is faster? In short, neither! strpos() is a better option by an average efficiency increase of 40%. It can be used about 293 times in place of doing a simple MySQL INSERT to increase efficiency up front for data that you would otherwise delete/ignore later.

Data Example

Running each 1,000,000 times:

  • stripos(): 3.4126560688019 (UPDATE: added strpos to this test which is better than strstr/stristr)
  • preg_match(): 5.895919084549
  • stristr(): 4.4556682109833

On average strpos (or stripos to be consistent with the lowercase based test) was a 40% improvement over using preg_match.

You can test this out yourself with your own string and rate with the short script below.

  1. <?php
  2. // stripos
  3. $start = microtime(TRUE);
  4. for($i=1000000; $i > 1; $i--) {
  5. stripos('com', 'PHP is the .com web scripting language of choice.');
  6. }
  7. $end = microtime(TRUE);
  8. echo 'stripos(): ' . ($end - $start) . '<br>';
  9.  
  10. // preg_match
  11. $start = microtime(TRUE);
  12. for($i=1000000; $i > 1; $i--) {
  13. preg_match('/com/i', 'PHP is the .com web scripting language of choice.');
  14. }
  15. $end = microtime(TRUE);
  16. echo 'preg_match(): ' . ($end - $start) . '<br>';
  17.  
  18. // stristr
  19. $start = microtime(TRUE);
  20. for($i=1000000; $i > 1; $i--) {
  21. stristr('PHP is the .com web scripting language of choice.', 'com');
  22. }
  23. $end = microtime(TRUE);
  24. echo 'stristr(): ' . ($end - $start) . '<br>';
  25. ?>

Why This Matters In Your PHP scripts

I stumbled into this question for the first time when I was setting up an array of preg_match() functions to check against multiple user agent variations of the current visitor. The idea was to only run an insert query when necessary. I didn’t want to log bots as it made up about 50%+ of web traffic on most sites. Google analytics filters these out.

For my own logging purposes I needed a filter that was more efficient than simply logging every hit that came to the site. A typical website might get hit about 1000+ times a day by search engine spiders and other various bots. Each one causing a INSERT to fire off if you’re doing your own logging. By running a preg_match on the user agent I was able to reduce server load significantly because I was running less INSERTS

MySQL Inserts Versus preg_match() & stristr()

When I ran an INSERT to my localhost xampp dev it took an average of .001 seconds. The data example I show above corresponds to running one million for a better test average. Doing the math:

  • Find on average how much a single stripos cost: 3.4126560688019/1000000 = 0.00000341265 seconds
  • Compare that to our more expensive mysql query: 0.001/0.00000341265 = 293.027 stripos calls

This number may be a bit different in someone else’s environment as there are several other factors at play here:

  • CPU speed
  • Insert vs Select vs Delete
  • The number of joins and rows/columns in your query
  • The version of php you’re using and the way you’re using it – php-fpm, php-fastcgi, hhvm etc.
  • Hopefully you’re not out of memory and swapping.

You might want to use the script above and just test it on your server if you need an exact number. Otherwise, your results should be pretty close to mine.

In Plain English

Generally speaking, it’s better to run a check on a string which you will later be discarding or otherwise ignoring up to 293 times using stripos. This makes a lot of assumptions, mainly that the strings you’re checking for are very plain and unique. There is a time and place to use preg_match if the situation calls for it with regular expressions & wildcards.

Respond: Leave A Comment | Trackback URL

Entrupeners, Subscribe for the lastest tools, tips, and tutorials.


8 Responses to Preg_Match Vs Stristr Versus SQL INSERT Efficiency

  1. I don’t know why I was sure that stristr() was faster… Anyways thanks for the info man, gotta go make some changes now.

  2. Look over my script and give it a try. I only created this because I tried googling for answers and didnt come up with anything factual.

  3. Kevin

    Check your code again – you need to reset the split_time after you echo the preg_match score, not before – the reality is stristr is about 20% faster.

    preg_match time: 2.1106829643 seconds.
    stristr time: 1.6658010483 seconds.
    -21% improvement.

    1; $i–) {
    $test = preg_match(“/com/i”, “PHP is the web scripting lan.comguage of choice.”);
    }

    $splitTime = timeStamp();
    $splitTimeecho = round($splitTime – $startTime,10);
    echo “preg_match time: “.$splitTimeecho.” seconds.”;

    // Do some more things here
    $splitTime = timeStamp();

    for($i=1000000; $i > 1; $i–) {
    $test2 = stristr(“PHP is the web scripting lan.comguage of choice.”, “com”);
    }

    $endTime = timeStamp();
    $totalTime = round($endTime – $splitTime,10);
    echo “stristr time: “.$totalTime.” seconds.”;
    echo round(($totalTime / $splitTimeecho) * 100) – 100 . “% improvement.”;
    ?>

  4. Thanks, I looked over it again…

    I am using the splitTime timestamp which was created after the preg_match and endTime which was created after the stristr and using those for the calculation.

    I realize the totalTime variable is a bit missleading in the way I named it. Its actually the time just for stristr.

    Correct me if I’m wrong or you see something else. Or if you find documentation on this test being done elsewhere. Not sure where you came up with 20%, but would be good to correct this post if my results are off somehow.

  5. I’d be curious what strings you’re searching against to determine whether it’s a bot? Just “bot” and “google” type stuff?

  6. Josh, do a print_r($_SERVER) on any php page and you’ll find a lot of browser info including the agent string. Scripts can define whatever they want for that field. Make themselves look like firefox, IE or whatever. Most of them have the word “bot” or “crawler” in that agent string which makes them somewhat easy to weed out. You can also get the person IP among other things from the $_SERVER array.

  7. Daniel

    Guys, PREG do cache, so just change
    $test = preg_match(“/com/i”, “PHP is the web scripting lan.comguage of choice.”);
    to
    $test = preg_match(“/com/i”, $i.”PHP is the web scripting lan.comguage of choice.”.$i);
    $test2 = stristr($i.”PHP is the web scripting lan.comguage of choice.”.$i, “com”);

    All is depend on how differ is original strings where we do match.

  8. Brett

    I believe your loops are actually only running 999,999 times instead of 1000000. You have $i > 1 so the loop never runs through the final iteration. Changing this to $i >= 1 or $i > 0 would give you the final iteration.

Leave a Reply

Custom Theme by Rob Malon | Content & Design © 2010 - RobMalon.Com - Chicago, Illinois