php[architect] logo

Want to check out an issue? Sign up to receive a special offer.

Putting glob() to the test

Posted by on April 28, 2010

In a new NetTuts+ post, Marcus Schumann offers a quick tip: Loop Through Folders with PHP’s Glob().

Are you still using opendir() to loop through folders in PHP? Doesn’t that require a lot of repetitive code everytime you want to search a folder? Luckily, PHP’s glob() is a much smarter solution.

The glob() function is convenient but the solution using the fewest lines of code isn’t always the most efficient — if by efficient you mean fastest.

This came up in a question on Stack Overflow in January.  A user asked how best to get a list of files in a directory (excluding “.” and “..” and other subdirectories) and return it as an array.  Several readers offered suggestions, and out of curiosity I benchmarked all their alternatives.  I ran each method 1,000 times on a directory containing about 400 files.  My benchmark results ranged from 12.4 seconds down to 1.2 seconds. That’s a pretty wide spread, so it’s worth paying attention to performance as well as coding convenience. Here are the results in order from slowest to fastest method:

The first method was to use glob() to return an array, and then loop over the result to exclude directories.  This was the slowest, running in 12.4 seconds.

    foreach(glob('*') as $file_or_dir) {
        if( !is_dir($file_or_dir) ) // is_dir will match . and ..
        {
            $files[] = $file_or_dir;
        }
    }

Next was simply using glob() without filtering directories. This ran in 8.1 seconds.

    $files = glob('*');

Using glob() with the optional GLOB_NOSORT argument shows how much impact sorting has on the results. If you don’t need sorted results, it’s worthwhile to say so, because this solution ran in 6.4 seconds — nearly double the performance of the slowest method.

    foreach(glob('*', GLOB_NOSORT) as $file_or_dir) {
        if( !is_dir($file_or_dir) ) // is_dir will match . and ..
        {
            $files[] = $file_or_dir;
        }
    }

The scandir() function is another alternative.  This ran in 6.5 seconds.

    $files = scandir('.');
    $result = array();
    foreach ($files as $file)
    {
        if (($file == '.') || ($file == '..'))
        {
            continue;
        }
        $result[] = $file;
    }

Next using scandir() with array_diff() to filter out the dot-directories had slightly better performance at 6.4 seconds, and this is almost as concise as using glob().

    $files = array_diff(scandir('.'), array('.', '..'));

The opendir() method for which Marcus wanted to find an alternative isn’t so shabby. This ran in 5.3 seconds.

    $files = array();
    $dir = opendir('.');
    while(($currentfile = readdir($dir)) !== false)
    {
        if( !is_dir($currentfile) )
        {
            $files[] = $currentfile;
        }
    }
    closedir($dir);

But using glob() in a bare form with GLOB_NOSORT shows that it may have been pretty costly to loop over the results.  This ran in 2.2 seconds.

    $files = glob('*', GLOB_NOSORT);

Or perhaps is_dir() was the source of the performance problem, because if we use opendir() and filter results by comparing to literal dot-directory names, we get the time down to 1.2 seconds.

    $files = array();
    $dir = opendir('.');
    while(($currentFile = readdir($dir)) !== false)
    {
        if ( $currentFile == '.' or $currentFile == '..' )
        {
            continue;
        }
        $files[] = $currentFile;
    }
    closedir($dir);

Of course it’s desirable to write concise code, but don’t assume this always equates to fast code. Rapid development and rapid code are independent goals, and you need to decide which has greater priority on a case-by-case basis.

And remember to use GLOB_NOSORT unless you actually need the list of files sorted.

Photo courtesy of Rick Audet. http://www.flickr.com/photos/spine/2425394931/ Released under Creative Commons Attribution licenses.


Tags: , ,
 

Responses and Pingbacks

This is good to know! I wonder how glob() compares to SPL’s DirectoryIterator.

I couldn’t resist benchmarking DirectoryIterator. 🙂

$files = array();
foreach (new DirectoryIterator(‘.’) as $item) {
$currentFile = (string) $item;
if ($currentFile == ‘.’ or $currentFile == ‘..’) continue;
$files[] = $currentFile;
}

I iterated 1,000 times over a directory with 1,000 files. Here are my results:

Glob: 6.448 sec.
Opendir: 3.048 sec.
DirectorIterator: 1.793 sec.

Thanks Bill.
I’ve always wondered about glob()’s performance.
Questions:
Did you try any of the SPL classes?
I assume you were on Linux?

Cheers from sunny Australia.

[…] glob function, the subject of a recent post on NETTUTS.com, is the topic of this new post from Bill Karwin on the php|architect website. He focuses on the efficiency of the function over […]

I was wondering about DirectoryIterator. Thanks Hector. I had a question about it once, you can find it in the URL I attached to this mail. It’s about how to represent an entire folder-tree as an array, recursively.

http://stackoverflow.com/questions/952263/

Thanks for the additional data point Hector. The DirectoryIterator wasn’t one that was suggested on the original StackOverflow thread in January, but it’s good to see how it compares.

SPL is undervalued, I think because of its neglected documentation.

I ran my tests on a Macbook Pro (Core 2 Duo 2.4GHz) running OS X Panther 10.5.8.

Awesome comparison Bill! I was going to start using glob() because the short code, but now I am going to bury myself in the SPL documentation (thx Hector).

Good work Bill on helping build a faster web.

Hi,

I’m interested in how you ran the tests. Did you use PHP as a script on the command line or as a page on a web site? If it was over the web, how (if at all) did you account for network latency, web server overhead, etc.?

When I ran your glob code as a script on 400 files (82K each, generated with dd and urandom), even with a large number of iterations it didn’t come close to the large numbers you’re seeing. In fact, it showed the same numbers as your final 1.2sec opendir() example: they both had sub-second responses.

Cheers,


Sam

Hi Sam, thanks for your comment. I ran these tests as a command-line script, not as a web page.

You can see the full source of my test script at StackOverflow: http://stackoverflow.com/questions/2120287/directory-to-array-with-php/2120496#2120496

The test results may vary on your platform, because you have a different CPU, a different filesystem, a different operating system, etc. I ran my tests on a Macbook Pro, running OS X Leopard 10.5.8, CPU is a 2.4GHz Core 2 Duo.

Even faster (and solves .htacess being listed too):

if($currentFile[0]!==’.’)

Btw.: thanks for sharing! Best…

One of the best thing with glob is the file search.

glob(‘./*.jpg’);
glob(‘./filename*’);

So is there any bench marks for this, where we want some actually file name?

my scandir is too slow with million files.

Leave a comment

Use the form below to leave a comment: