php[architect] logo

Want to check out an issue? Sign up to receive a special offer.

Exporting Drupal Nodes with PHP and Drush

Posted by on October 5, 2015

With Drupal, it’s possible to build very intricate solutions by gluing the right combination of contrib modules together. Of course, there is a downside to relying only on contrib modules, particularly for one-off or highly custom tasks. Bringing in a contrib module means you assume responsibility for feeding and caring for it. If it’s a mature module, like Views, that’s one thing. But if you find yourself looking at a little-used module that hasn’t seen a stable release yet—or worse hasn’t been updated in months—you may be better off rolling your own solution.

Drupal 8 development has shown that PHP itself, and the wider PHP community, already provides ways to solve common tasks. In this post, I’ll show you some core PHP functionality that you may not be aware of; pulling in packages and libraries via Composer is a topic for another day.

In this particular case, I need to export a CSV file showing all nodes added to a site after a specific date. This report may get created just a handful of times. True, I could easily build a View and use Views Data Export to download it as CSV but doing it purely in code has these benefits:

  • No module dependencies. It’s at least one less thing to update and maintain, especially if security issues are found. If data exports in multiple formats were a more import feature across this site, the Views Data Export module would definitely make sense.
  • Easy to deploy. If I make it a view, I have to remember to export it and make sure it’s deployed to all my environments. If someone edits the view on one of them and doesn’t export it, I could lose improvements or fixes. Granted Features can really streamline the process, but in this case it’s not critical, and as you’ll see everything we do will be in code anyway.
  • Integration with Drush. I created a Drush command to run this at the command line. This made it quicker for me to develop since it was easy to run without involving a browser. PHPStorm’s built-in terminal was perfect for this task. Drush commands are also useful for automation and scheduling. If I need, this script can be cron’d to run daily and pipe the output somewhere.
  • Efficient resource usage. Depending on how PHP is configured, if you’re running a command line script you may not have to worry about memory_limit or max_execution_time settings. These can cause your script to terminate unexpectedly when you’re processing a lot of data. In this case, we could be exporting hundreds or thousands of nodes depending on how far back in time we go.

A Minimal module.info File

For this example, I created a simple exporters module and prefixed it with my initials to prevent unexpected naming clashes. Below are the contents of my om_exporters.info file, free of any other module dependencies. I also created an om_exporters.module file just in case but it ended up empty.

name = OM Exporters
description = Exports newest content to CSV
core = 7.x
php = 5.5
version = 7.x-1.0

Creating a Drush Command

Creating a custom Drush command is straightforward. Drush will look for new commands in a file called <module>.drush.inc. In that file, you should have a function named <module>_drush_command(), which implements a hook_Drush_command(). When you add or change commands, you’ll need to clear caches for Drush to know about the changes.

In this case, the function is simple:

function om_exporter_drush_command() {
    $items['export-newest'] = array(
        description' => 'Export Newest Content to CSV',
        'aliases' => ['ex-new'],
        'callback' => 'om_export_newest',
        'arguments' => [
            'date' => '',
        ]
    );

    return $items;
}

As you can see, the key to $items becomes our Drush command export-newest. We can give it a friendly description, a shorter alias (ex-new), specify the function to run in callback, and list any arguments we require.

Now, when you run the Drush command as shown below, whatever function listed as the callback will be invoked and passed the arguments needed.

Drush export-newest 2015-08-01

Validating Dates

The first thing we’ll do is validate the date argument using PHP 5’s DateTime class. If you’re not familiar with it, this class makes it easy to work with Dates and Timezones without a lot of fuss in an object-oriented manner. It’s easier to understand than using older functions like time(), date(), and strtotime().

The code below takes the $date_arg passed in from Drush and makes sure its parsable as a date. If validation fails, our function does not continue.

function om_export_newest($date_arg) {
    // get the default timezone
    $tz = variable_get('date_default_timezone', 'UTC');
    $tz = new \DateTimeZone($tz);
    // First validate date, we assume $date_arg is
    // a string that can be parsed by \DateTime so
    // you have the flexibility to pass
    // in '-1 months' or '28 days ago'
    try {
        $since = new \DateTime($date_arg, $tz);
    } catch (\Exception $ex) {
        watchdog('export-newest',
            Could not parse date:' . $ex->getMessage(),
            null,
            WATCHDOG_CRITICAL);
        return;
    }
    // ..

Output CSV with SPL’s File Object

The Standard PHP Library is a collection of useful classes that’s not as well known as it should be. It’s intended to solve common problems. In this task, I used the \SplFileObject class to work with files as objects. I find it a lot easier than remembering and looking up the different file_* functions, since I get autocompletion in my IDE. For this, we create a file object that writes to STDOUT so that our command will output everything to the terminal or screen. First, we create our SPLFileObject:

// we will use SPL to send our data to STDOUT formatted as CSV
$fout = new \SplFileObject("php://stdout");

Scripts write to STDOUT by default anyway, but to use the built-in fputcsv method, we need to specify it explicitly.

Typically the first line of a CSV file is the header describing the columns that follow. Now, here’s where we use fputcsv(). This method takes in an array and then writes it as a CSV file. The method automatically handles using commas to separate fields, enclosing text fields with quotes, and so on. You can even configure how all that is handled; for example, if you need to use ‘;’ as the separator. See the online documentation for fputcsv for details.

// write our headers
$fout->fputcsv([
    'nid', 'type', 'title', 'date_created', 'path_alias'
]);

Finding Nodes with EntityFieldQuery

Drupal’s native EntityFieldQuery API is a powerful alternative to always relying on Views to create and filter some collection of nodes, users, taxonomy terms, etc. It’s also object-oriented (I keep saying that) and provides a very readable interface for querying Drupal’s databases. It abstracts away the underlying data store for you, so you don’t need to know exactly what tables or fields everything is in. For that same reason, it’s much safer to use than doing a direct db_query().

One thing that is tricky at first is wrapping your head around the terms it uses. Entities have properties that are common to all the entities of that kind. For nodes, these are things like the title, created date, the bundle (content type), the status, and more. If it’s fieldable, it can have fields specific to a bundle. If you need to query on property values, you use propertyCondition. For fields, use fieldCondition. The same pattern holds if you need to sort them by one or the other. To dig deeper, see How to Use EntityFieldQuery.

The code below shows how get all the nodes with a created timestamp greater than the one we pass to our Drush script.

// query for nodes newer than the specified date
$query = $query = new EntityFieldQuery();
$query->entityCondition('entity_type', 'node')
            ->addMetaData('account', user_load(1)) // Run the query as user 1
            ->propertyCondition('created', $created->format('U'), '>')
            ->propertyOrderBy('created', 'ASC');

Iterating Entities with Generators

Generators were introduced in PHP 5.5. They’re a special kind of function that uses the yield keyword to return an item that’s part of some collection. In practice, they’re a simple way to build a function that iterates over a collection without having to build the whole collection in memory first.

In the code excerpt below you’ll see how we loop through the nodes returned by our EntityFieldQuery. We do this in a generator—notice the yield keyword.

$result = $query->execute();
if (!empty($result)) {
    // $result is an array with node ids
    foreach ($result['node'] as $nid => $row) {
        $count++;
        
        // TRUE will reset static cache to
        // keep memory usage low
        $node = node_load($row->nid, null, TRUE);
        if ($count < 100000) {
            yield $node;
        }
    }
}

The generator function is called in a foreach loop which terminates once the generator stops yielding values.

    // use a generator to loop through nodes
    foreach (nodes_generator($since) as $node) {
        $fout->fputcsv([
            $node->nid,
            $node->type,
            $node->title,
            $node->created,
            url('node/' . $node->nid)
        ]);
    }

Usage

With the module complete and enabled, we have a new Drush command out our disposal. We can invoke it in a terminal shell with the command below:

drush export-newest 2015-06-01

That command will output our nodes as a stream of CSV lines. To save the output in a file, just redirect it as shown below. Once you download the file, you can send it to your client and/or project manager to peruse in Excel.

drush export-newest 2015-06-01 > newest-nodes.csv

Conclusion

There you have it, a straightforward independent Drush command that uses built-in PHP libraries to export a collection of nodes to CSV. Of course, we could have done it with Views and some other modules, but this module is easy to version and quick to deploy. It only depends on Drupal core, so it’s super low-maintenance.

For the full script, check out the gist on github


Oscar still remembers downloading an early version of the Apache HTTP server at the end of 1995, and promptly asking "Ok, what's this good for?" He started learning PHP in 2000 and hasn't stopped since. He's worked with Drupal, WordPress, Zend Framework, and bespoke PHP, to name a few. Follow him on Google+.
Tags: , , , ,
 

Responses and Pingbacks

[…] In Exporting Drupal Nodes with PHP and Drush, [php]Architect magazine cover how to export data from Drupal nodes using the Drush command line tool. Again on the prolific Sitepoint Blog, Christopher Pitt covers An Introduction into Event Loops in PHP. Event loops allow you to create Node style web servers in PHP, and it’s a really interesting read. […]

Leave a comment

Use the form below to leave a comment: