PHP Architect logo

Want to check out an issue? Sign up to receive a special offer.

Token-Oriented Object Notation (TOON) For PHP Developers

Posted by on May 6, 2026

See the video version at https://youtu.be/Nk9ayWxkJ1M

It’s an unfortunate piece of our current developer reality that a lot of our day is spent worrying about how many tokens we’re spending. Every time you send structured data to an LLM API, you’re paying for tokens. And if you’re sending arrays of similar objects as JSON, you’re paying to repeat the same field names for every single record. That’s not a bug in JSON, it’s just how the format works. But there’s a better option for this specific use case.

In this article, we’re going to look at Token-Oriented Object Notation (TOON) and how it strips that repetition out of your data.

The problem

Let’s say you’re tracking podcast episode statistics and you want to send that data to an LLM. Maybe you want it to flag underperforming episodes or summarize listening trends across your whole catalog. Here’s what a typical batch of episode records looks like in PHP:

<?php

$episodes = [
    [
        "id" => 1,
        "title" => "Getting Started with Laravel",
        "published_at" => "2026-01-15",
        "downloads" => 4821,
        "avg_listen_time" => "34:12",
        "completion_rate" => 0.71
    ],
    [
        "id" => 2,
        "title" => "PHP 8.4 New Features",
        "published_at" => "2026-02-03",
        "downloads" => 6204,
        "avg_listen_time" => "41:55",
        "completion_rate" => 0.83
    ],
    [
        "id" => 3,
        "title" => "Scaling MySQL for High Traffic",
        "published_at" => "2026-02-24",
        "downloads" => 3109,
        "avg_listen_time" => "28:40",
        "completion_rate" => 0.58
    ],
];

$jsonPayload = json_encode($episodes, JSON_PRETTY_PRINT);

That json_encode() call produces output like this:

[
    {
        "id": 1,
        "title": "Getting Started with Laravel",
        "published_at": "2026-01-15",
        "downloads": 4821,
        "avg_listen_time": "34:12",
        "completion_rate": 0.71
    },
    {
        "id": 2,
        "title": "PHP 8.4 New Features",
        "published_at": "2026-02-03",
        "downloads": 6204,
        "avg_listen_time": "41:55",
        "completion_rate": 0.83
    },
    {
        "id": 3,
        "title": "Scaling MySQL for High Traffic",
        "published_at": "2026-02-24",
        "downloads": 3109,
        "avg_listen_time": "28:40",
        "completion_rate": 0.58
    }
]

You might notice there’s a log of duplicate information in the keys ("id", "title", "published_at", "downloads", "avg_listen_time", "completion_rate"). Each one shows up three times, once per record.

You might be saying that three records is nothing but as you scale up and you send over 500 episodes, you’re repeating those six field names 500 times each. That’s 3,000 strings the LLM has to tokenize, and you’re paying for every single one.

Regardless of how cheap the tokens are individually, your total bill climbs faster than you’d think when you’re running hundreds of API calls a day.

What is TOON

TOON stands for Token-Oriented Object Notation. It solves the repetition problem from earlier by trading structure for whitespace. We won’t see things we’re accustomed to in PHP like curly braces, square brackets, or quoted key names. We’re just going to use indentation to show hierarchy, like YAML. That alone cuts a surprising number of punctuation tokens while not increasing error rates all that much.

An even bigger win is the tabular format that TOON uses. When all your records share the same fields, you write the field names once as a header row, and each record becomes a single comma-separated line. The keys go from appearing once per record to appearing once, period.

Our episode data in TOON looks like this:

episodes[3]{id, title, published_at, downloads, avg_listen_time, completion_rate}
    1, Getting Started with Laravel, 2026-01-15, 4821, 34:12, 0.71
    2, PHP 8.4 New Features, 2026-02-03, 6204, 41:55, 0.83
    3, Scaling MySQL for High Traffic, 2026-02-24, 3109, 28:40, 0.58

That’s the whole dataset. Field names once at the top, one row per record below that. On this sample data, TOON achieves a 60.42% token reduction: 379 tokens down to 150. Hard to get excited about with three rows but again at 500 episodes per batch, that’s where it becomes real money at production scale.

We’ll have more after this word from our partners.

Working with TOON in your prompts

As always, with supporting file formats in our code, we have two options. We can build our own code, do it quickly and dirty, or we can find a library on Packagist that does the work for us. Thankfully, someone already has, and it supports more options than just our basic array of object examples above. It’s toon-php and it allows us to simply call “Toon::encode()”

echo Toon::encode([
    'users' => [
        ['id' => 1, 'name' => 'Alice', 'role' => 'admin'],
        ['id' => 2, 'name' => 'Bob', 'role' => 'user'],
    ]
]);

To get our data in TOON format.

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

It can also reverse TOON back into a PHP array:

// Decode objects (returned as associative arrays)
$toon = <<<TOON
id: 123
name: Ada
active: true
TOON;

$result = Toon::decode($toon);
// ['id' => 123, 'name' => 'Ada', 'active' => true]

It’s fairly slick.

Sending Toon to the LLM

Now that we have our data in TOON format, you also need to tell the model how to read this format.

This can be done by adding it to your system prompt:

<?php

$systemPrompt = "You will receive data in TOON (Token-Oriented Object Notation) tabular format. "
    . "The first indented line contains comma-separated field names. "
    . "Each subsequent indented line contains comma-separated values in the same field order. "
    . "Parse each line as a record using the header fields as keys.";

$userPrompt = "Analyze the following podcast episode statistics and identify which episodes have low completion rates:\n\n"
    . convertToToon($episodes, "episodes");

Leave that out, and the model treats your data as unstructured text. It’ll miss the structure entirely and give you garbage back.

It’s still wild that this works as well as it does.

Gotchas

Now there are some gotchas you have to keep in mind.

The first is that with everything with LLMs, we have to include that everything is brand new in this field, so this is most likely going to be out of date before we hit the publish button. It’s still beneficial just to understand the concept because there are so many places where we messy humans need structure that LLMs don’t.

Because LLMs are non-deterministic, your mileage is going to vary, but in benchmarks, TOON is just as accurate as JSON for returned results, so make sure you’re always looking for bad data coming back from the LLM and including a “this data is generated using AI” warning on the output.

The most annoying gotcha is that you have to explain the format every time. JSON is something every LLM has been trained on because it exists in all of the LLMs’ training data, but TOON isn’t (yet). If you don’t include those parsing instructions in the system prompt, the model will read your rows as plain text, and the response will be wackadoo.

TOON also falls apart with messy data. It works best with flat, consistent records where every item has the same fields. Nested objects, optional fields that vary between records, arrays inside arrays, all of that breaks the tabular structure. If you’re sending that data, stick with JSON. You might also can also do a hybrid where TOON handles the flat parts, and JSON handles anything irregular.

Finally, the math only works at scale. If you’re just sending two or three records, then the token overhead of explaining the format in the system prompt costs more than you save. Hundreds or thousands of records is where TOON earns its keep.

What You Need To Know

  • TOON writes field names once as a header row, then one comma-separated line per record after that.
  • Benchmarks on uniform data like episode stats show roughly a 60% token reduction.
  • You MUST include format instructions in your system prompt. LLMs aren’t trained on TOON and won’t figure it out on their own.
  • Flat, consistent data is where TOON shines. Nested or irregular structures are still better handled by JSON.

 

Leave a comment

Use the form below to leave a comment:

 

Our Partners

Collaborating with industry leaders to bring you the best PHP resources and expertise