php[architect] logo

Want to check out an issue? Sign up to receive a special offer.

Security Corner: Twist and Shout

Posted by on November 7, 2019
By Eric Mann

Most self-taught developers in our industry learn to leverage an API long before they spend time learning lower-level coding patterns. This experience isn’t necessarily a bad thing. All the same, it’s important to take some time to dig deeper and better understand the tools and technologies at the core of our trade.

Get the Full Issue

This article was published in the September 2019 issue of php[architect] magazine. Download the Free Article PDF to see how it looks in the published magazine.

If you'd like more articles like this one, become a subscriber today!. You can get each monthly issue in digital and print options or buy individual issues.

Subscribe

Computers and digital technology, in general, are highly deterministic. Provide a specific set of inputs to an algorithm, interface, or system, and you’ll always get a reliable, predictable output in return. It’s this determinism that makes computers so well suited for rote tasks and routine operations. However, this same determinism can also be a weakness—particularly in the realm of security.

A calculator that can’t reliably or predictably solve arithmetic is useless. A token or identity verification service that issues predictable session identifiers to authenticating parties, however, is also useless. Said another way, we don’t want PHP session IDs to be automatically incrementing integer values; otherwise, an attacker could easily predict and potentially hijack your users’ otherwise secure data. Computers are deterministic by nature, so we need to leverage purpose-built random number generators to introduce unpredictability into the system.

Psuedo-Random Number Generators (PRNG)

When programming, most random number generators aren’t truly random. They’re based on specific algorithms which produce patterns that are very difficult, but not impossible to reproduce. These generators are “pseudo-random” and help provide a level of non-determinism our applications can leverage.

There’s an entire field in computer science focused on developing genuinely random systems and tests to verify the randomness of anything intended to be random. For most purposes, pseudo-random systems are usually random enough.

Truly secure applications should leverage a cryptographically secure pseudo-random number generator (CSPRNG). A cryptographically secure system is one that passes rigorous statistical tests and is highly resistant to common forms of statistical attack. Both the random_bytes() and random_int() functions introduced in PHP 7 are considered cryptographically secure. You should use one of these functions by default anywhere you need randomness in your application. Older functions like mt_rand() (discussed in this article) served a purpose in older versions of PHP, but you should not use them unless you have a legitimate reason to do so in a legacy application. Even if, for some reason, you are still running PHP 5 and need a CSPRNG, the paragonie/random_compat package provides the same functionality by way of a polyfill.

A PRNG algorithm typically starts with a single seed, then generates an infinite (or at least very long) series of seemingly random numbers from that seed. The advantages of such an algorithm are that they’re fast, easy to implement, and generate a deterministic sequence of numbers each time. Start with the same seed, and you’ll always get the same pseudo-random sequence back. The most significant disadvantage of these algorithms is that it’s easy to get the implementation wrong. A second disadvantage is, given a seed, you’ll always get the same sequence of numbers back each time.

The algorithm’s advantage—deterministic sequence generation—is also a drawback; we’ll come back to this in a bit.

One of the most widely-used PRNGs in computer science is the Mersenne Twister, developed in 1997. It’s the “mt” in PHP’s mt_rand() function.

Merssene Twister

In a nutshell, PRNGs follow a pretty specific pattern:

  1. Initialize the state of the system from a specific seed.
  2. Use a one-way function f to output a random number from that state.
  3. Use a one-way function g to mutate the state.
  4. Repeat steps 2-3 for every subsequent request of a random number.

The Mersenne Twister is a bit different. It still uses a seed, keeps track of state, and outputs random numbers but doesn’t use one-way functions and the state is larger than a single number. The outline for the Merssene Twister is:

  1. Initialize the state of the system from a specific seed.
  2. Twist the state (this would be the f above).
  3. Temper the state (this would be g) to return a random number.
  4. Repeat steps 2-3 for every subsequent request.

One key difference between the Merssene Twist and other PRNGs is the state isn’t a single number; it’s an array of 624 numbers. Each iteration of the algorithm could produce 624 distinct random numbers rather than a single one!

Let’s work through each step, in turn, to see how the algorithm looks in userland PHP.

Step 1. Initialize

Given a seed, we need to initialize an array of 624 integers. We can pick any seed—PHP defaults to the current Unix timestamp multiplied by the process ID in an attempt to provide a somewhat “random” seed which differs between environments. We need to filter our seed to ensure it’s only 32 bits, then use a magic initialization constant and some well-known bit arithmetic to initialize our state container as in Listing 1.

Listing 1

<?php

class MT_Rand
{
   private $state = [];

   public function __construct(?int $seed = null) {
      $seed = $seed ?? time();

      $this->state = [$seed & 0xffffffff];

      foreach (range(1, 624) as $i) {
         $this->state[$i] = (
            (
               (0x6c078965
                  * ($this->state[$i - 1]
                     ^ ($this->state[$i - 1] >> 30))
               )
               + $i)
         ) & 0xffffffff;
      }
   }

  // ...

Step 2. Twist

At this point, we have a 624-element array containing our initial state. Given the same seed, this initializes the same array every time. Before we can use it, though, we need to mutate the array with the twist() algorithm (See Listing 2).

Listing 2

   private function twist() {
      foreach (range(0, 624) as $i) {
         $y = (($this->state[$i] & 0x80000000)
              + ($this->state[($i + 1) % 624] & 0x7fffffff))
              & 0xffffffff;

         $this->state[$i] = (
               $this->state[($i + 397) % 624] ^ ($y >> 1)
            ) & 0xffffffff;

         if (($y % 2) == 1) {
            $this->state[$i] = ($this->state[$i] ^ 0x9908b0df)
                               & 0xffffffff;
         }
      }
   }

Step 3. Temper

The almost-final step, shown in Listing 3, is to “temper” the transformation created by the twist operation above. This tempering helps to “compensate for the reduced dimensionality of” the distribution of numbers in our state array and render each subsequent iteration unpredictable.

Listing 3

   function temper() {
      static $index = 0;

      if ($index == 0) {
         $this->twist();
      }

      $y = $this->state[$index];
      $y = ($y ^ ($y >> 11)) & 0xffffffff;
      $y = ($y ^ (($y << 7) & 0x9d2c5680)) & 0xffffffff;
      $y = ($y ^ (($y << 15) & 0xefc60000)) & 0xffffffff;
      $y = ($y ^ ($y >> 18)) & 0xffffffff;

      $index = ($index + 1) % 624;

      return $y;
   }

Step 4. Output

Once we’ve initialized, twisted, and tempered, we can return the final output as the next random number in our series. The PHP implementation explicitly confirms with the reference implementation in mt19937ar.c and performs a right-shift to return a 31-bit integer. If we want to match the same, then our entire operation resembles Listing 4.

Listing 4

   public function __invoke() {
      $random = $this->temper();
      return ($random >> 1) & 0x7fffffff;
   }

} // end of class

$seed = 12345;
$rand = new MT_Rand($seed);
$random = $rand();

To verify our implementation, we can wire all of the above scripts together and compare directly with PHP’s mt_rand(). The trick is to set the same seed value for each run as in Listing 5.

Listing 5

<?php
// Our series
$rand = new MT_Rand(42);
echo '---- Our implementation ----' . PHP_EOL;
echo $rand() . PHP_EOL;
echo $rand() . PHP_EOL;
echo $rand() . PHP_EOL;

echo '---- PHP mt_rand() ----' . PHP_EOL;
mt_srand(42); // sets the seed
echo mt_rand() . PHP_EOL;
echo mt_rand() . PHP_EOL;
echo mt_rand() . PHP_EOL;

If you execute the above at the command line, you’ll see the contents of Output 1.

Output 1

$ php mt_rand.php
---- Our implementation ----
804318771
1710563033
2041643438
---- PHP mt_rand() ----
804318771
1710563033
2041643438

Both our userland implementation above and the default PHP implementation of the Mersenne Twister generate the same pseudo-random series given the same seed. This property is useful for testing purposes or when you need a predictable but seemingly random series. Unfortunately, this consistency is one of the indicators of this system’s inherent weakness.

Weaknesses In Implementation

Neither the “twist” nor “temper” operations are one-way functions. If you know the output of the system, you can “untemper” it to return to the raw state, then “untwist” the state to get to a previous step. Go back far enough, and you can even recover the initial seed. Since this Merssene Twist implementation leverages a 624-element internal state, capturing 624 subsequent outputs of the PRNG gives an attacker sufficient information to walk back through the algorithm and effectively break the system!

Yet all is not perfect in terms of non-predictability. The MT19937 algorithm keeps track of its state in 624 32-bit values. If an attacker were able to gather 624 sequential values, then the entire sequence—forward and backward—could be reverse-engineered. source

Similarly, knowing exactly how the PRNG is seeded gives an attacker a good idea of where to start with an attack by guessing the seed and checking the RNG’s output. If you fail to provide a seed, the algorithm will seed itself with an integer based on the PHP process’ ID and the current UNIX time in seconds.

Again, the deterministic nature of the computer makes it reasonably straight forward for an attacker to brute force their way to a root value. Once an attacker discovers a seed, they can predict every “random” number your system uses moving forward.

The Mersenne Twister is a handy algorithm for some things internal to your application that need to seem random but for which true randomness is not a strict requirement. If you need true randomness, instead use a cryptographically secure PRNG like random_bytes() or random_int().

Never Roll Your Own Crypto!

Now we’re to the part that includes shouting. It’s a critical point, so lean in and ensure you’re listening.

Never. Roll. Your. Own. Cryptography.

It is incredibly important you fully understand the risk associated with designing or building your cryptographic system. It’s vital you work with the experts who understand the proper implementations of the system you need to use, and even experts make mistakes. The original PHP implementation of mt_rand() actually had a typo that rendered it incompatible with the canonical mt19937 reference implementation of the Mersenne Twister. This bug has since been fixed, but the point that even the experts can get it wrong should be enough.

Your team should fully understand the inner workings of the libraries, tools, and technologies upon which your project stands. Knowing the strengths and limitations of these more primitive implementations helps your team write better, more stable software. Leverage expertise when and where possible—let the cryptography experts write the crypto while your in-house application experts write the application. This domain-specific focus will undeniably lead to better deliverables.

Biography

Eric is a seasoned web developer experienced with multiple languages and platforms. He’s been working with PHP for more than a decade and focuses his time on helping developers get started and learn new skills with their tech of choice. You can reach out to him directly via Twitter: @EricMann

Tags: ,
 

Leave a comment

Use the form below to leave a comment: