stristr() vs preg_match(). Which one is faster? In short, preg_match() is the winner by an average efficiency increase of 30%. It can be used about 120 times in place of doing a MySQL INSERT to increase efficiency up front for data that you would otherwise delete later.
Data Example
Running each 1,000,000 times:
preg_match time: 2.01364398 seconds.
stristr time: 2.7015089989 seconds.
34% improvement when using preg_match.
You can test this out yourself with your own string and reoccurrence rate with the script below.
<?phpfunction timeStamp() {list($usec, $sec) = explode(" ", microtime());return ((float)$usec + (float)$sec);}$startTime = timeStamp();for($i=1000000; $i > 1; $i–) {$test = preg_match("/com/i", "PHP is the web scripting lan.comguage of choice.");}$splitTime = timeStamp();$splitTimeecho = round($splitTime - $startTime,10);echo "preg_match time: ".$splitTimeecho." seconds.";// Do some more things herefor($i=1000000; $i > 1; $i–) {$test2 = stristr("PHP is the web scripting lan.comguage of choice.", ‘com’);}$endTime = timeStamp();$totalTime = round($endTime - $splitTime,10);echo "<br>stristr time: ".$totalTime." seconds.<br>";echo round(($totalTime / $splitTimeecho) * 100) - 100 . "% improvement.";?>- Download this code: 1222timertest.txt
Why This Matters In Your PHP scripts
I stumbled into this question for the first time when I was setting up an array of preg_match() functions to check the user agent of the current visitor. The idea was to only run an insert query when necessary. I didn’t want to log bots as it make up about 50%+ of web traffic on most sites. Google analytics filters these out.
For my own logging purposes I needed a filter that was more efficient than simply logging every hit that came to the site. A typical website might get hit about 1000+ times a day by search engine spiders and other various bots. Each one causing a INSERT to fire off if you’re doing your own logging. By running a preg_match on the user agent I was able to reduce server load significally because I was running less INSERTS (I’ll explain why you want to do your own logging and how to set it up in future posts).
MySQL Inserts Versus preg_match() & stristr()
When I ran an INSERT to my localhost xampp setup it took an average of .001 seconds. The data example I show above corresponds to running one million for a better test average. Thus the following ratio of executions resulted to an average of .001 or less:
- preg_match() - 120 executions.
- stristr() - 84 executions.
In smaller quantities results were mixed when using the script above. More often than not I got an improvement on preg_match, but not always which such small numbers. Technically it should have results exactly 0% using the number ratio above. Putting stristr at 120 should show the 30% improvement again. Based on an average of running each one million times, preg_match is the clear winner.
In Plain English
It is better to run a check on a string which you will later be deleting in a database up to 120 times using preg_match. Anotherwords, you can check for 120 different strings that would disqualify that string from appearing in the DB.









































I don’t know why I was sure that stristr() was faster… Anyways thanks for the info man, gotta go make some changes now.
Look over my script and give it a try. I only created this because I tried googling for answers and didnt come up with anything factual.
Check your code again - you need to reset the split_time after you echo the preg_match score, not before - the reality is stristr is about 20% faster.
preg_match time: 2.1106829643 seconds.
stristr time: 1.6658010483 seconds.
-21% improvement.
1; $i–) {
$test = preg_match(”/com/i”, “PHP is the web scripting lan.comguage of choice.”);
}
$splitTime = timeStamp();
$splitTimeecho = round($splitTime - $startTime,10);
echo “preg_match time: “.$splitTimeecho.” seconds.”;
// Do some more things here
$splitTime = timeStamp();
for($i=1000000; $i > 1; $i–) {
$test2 = stristr(”PHP is the web scripting lan.comguage of choice.”, “com”);
}
$endTime = timeStamp();
$totalTime = round($endTime - $splitTime,10);
echo “stristr time: “.$totalTime.” seconds.”;
echo round(($totalTime / $splitTimeecho) * 100) - 100 . “% improvement.”;
?>
Thanks, I looked over it again…
I am using the splitTime timestamp which was created after the preg_match and endTime which was created after the stristr and using those for the calculation.
I realize the totalTime variable is a bit missleading in the way I named it. Its actually the time just for stristr.
Correct me if I’m wrong or you see something else. Or if you find documentation on this test being done elsewhere. Not sure where you came up with 20%, but would be good to correct this post if my results are off somehow.
I’d be curious what strings you’re searching against to determine whether it’s a bot? Just “bot” and “google” type stuff?
Josh, do a print_r($_SERVER) on any php page and you’ll find a lot of browser info including the agent string. Scripts can define whatever they want for that field. Make themselves look like firefox, IE or whatever. Most of them have the word “bot” or “crawler” in that agent string which makes them somewhat easy to weed out. You can also get the person IP among other things from the $_SERVER array.