php[architect] logo

Want to check out an issue? Sign up to receive a special offer.

Debugging with Purpose

Posted by on June 22, 2021
By Joseph Maxwell

Debugging and solving problems is an art. It is part technique and part practice. In this article, I share a powerful technique that has helped me quickly solve many problems. This system gets my head out of a rut and pushes me to look at other solutions creatively. You will also learn some approaches to discovering the necessary information. Finally, this technique is tied up into a nice little package that cements your status as a hero in your workplace.

Get the Full Issue

This article was published in the June 2021 issue of php[architect] magazine. Download the Free Article PDF to see how it looks in the published magazine.

If you'd like more articles like this one, become a subscriber today!. You can get each monthly issue in digital and print options or buy individual issues.

Subscribe

My wife and I recently sold our lovely first home and moved into a different house. As part of the final “pre-listing” efforts, we had to fix this and that to make sure our house was perfect. One of those items on the to-do list was repairing a light in the kitchen. Even though I had replaced the bulb at least once, the light never re-energized. Duty always followed, and I was conveniently on to another problem to solve (hey, who wants to get up into the attic with blistering heat and fluffy, white, itchy insulation).

Over time, I thought through this problem and decided there was only one probable cause: I must have forgotten to wire up the light when I installed it. Eventually, the day arrived where this task changed from being on the “honey-do” list to the “honey must-do” list. First, I set my alarm early so I wouldn’t have to enjoy the unbearable heat of the attic. Next, I assembled my electrical tools and crawled up into the attic. I shuffled my way through the knee-deep insulation and finally located the light in question. After opening the box, I made quite the shocking discovery: the box was wired! What?! The light doesn’t work, so how could it be wired?

I made the semi-dangerous trek back down through the narrow attic entrance (which happens to be directly above my wife’s pantry). I remember standing in the kitchen thinking, “If it’s wired up, why doesn’t it work?” It was at this moment that a new thought dawned on me: “What if I tried a different light bulb?” Sure, I can do that, but what difference would that make? So I fetched a different type of light bulb, screwed it into the socket, and it worked! But why did it work? The only thing I could identify is that the socket was slightly mis-manufactured, and the bulbs I had tried did not have a long enough neck to make a good connection with the electrical contacts.

Debugging is a valuable life skill and applies to way more than fixing PHP. You see, if I had just tried multiple bulbs, I would have had a working light years ago.

Solving Problems is Not Limited to Our Daily Job

As software developers, we face some tough challenges. How about that cron job that is randomly killed? Maybe we have an inventory reservation system that mostly works, but there are those one or two times a month where it accidentally allows for an oversell.

The goal of this article is to present you with an easy-to-remember framework that will help you quickly get to the right solution every time.

Meet TAD

TAD is a great name. In fact, that’s the name of my salesperson at the place where I like to buy guitar equipment. But TAD is not just a great name for some great people; it’s also an acronym for a debugging framework.

Here you go:

  • Take an inventory: gather information to define a hypothesis.
  • Attempt a fix: take the hypothesis and see if it lends toward a fix.
  • Do it again: step back and review your progress.

Many areas in life benefit from short and concentrated bursts of energy. Does anyone still look at the waterfall software development pattern as a hallmark of best practice? From my vantage point, it seems everyone uses agile in one form or another. Why? There is value in taking regular breaks (retrospectives) to ensure we are on the right track. You will see that the Do it again step is the location for this review.

This cycle is the agile method for debugging.

The key to making this system work is time limits! If you are a junior developer, consider a one-hour time cap for each of the first two phases. Set your time. When this elapses, stop and proceed to the next step.

Because we have time limits, this framework becomes a cycle. Proceed from round to round until you resolve the task in its entirety. It is my experience that Round #1 will resolve symptoms, but Round #2 is almost always necessary to provide a comprehensive solution. For example, Round #1 might be to identify (Take an inventory) and change (Attempt a fix) a value in the database. Then, you should step back and ask yourself the question, “Why?” (Do it again). Round #2 would be to understand why that value was set (Take an inventory) and then update the code to prevent this from happening (Attempt a fix).

While I familiarize you with “the system,” I’ll walk you through how I solved a recent problem. For example, imagine you get a bug report like this:

When I save a product, I get a blank page / fatal error. Please fix immediately.

Game on, time to get to work.

Take an Inventory (Round 1)

Goal: Locate the Problem.

Don’t feel pressure that we have to locate that exact, last, final problem. It likely won’t be, especially with a 30-minute time cap. It’s better to exit this phase without a clear definition of what’s wrong than it is to keep pushing on and on.

Take an inventory consists of two data-gathering ideas.

The person who reported the problem will likely have not idea as to the real problem. Therefore, you must separate the symptoms (what your client reports) from the hypothesis (what they think is the problem). The reported symptoms might be the gatekeeper to a far more significant root cause. For example, I ran into a situation where a button was missing—it turns out that an entire process was running in an unexpected context which caused some undesirable side effects.

The first part of this step is to assemble all pertinent information into one place: the trouble ticket. These submissions from our clients are notoriously vague. When did this problem occur? What related entries can you find in the log files? What were the exact steps to replicate? Can you duplicate this problem in your local environment? What factors or systems are coming into play?

Where do you find error log entries? There are three locations:

  1. Web server log files. The location is specified in your web server configuration. For NGINX, you can run: sudo nginx -T | grep log. On Apache, you can execute sudo apachectl -D DUMP_VHOSTS | grep log.
  2. PHP-FPM log files. Finding these is a little more tricky. You need to identify the location of PHP-FPM’s configuration. This is often in /etc/php-fpm.d/www.conf. You could also run find / -name www.conf.
  3. PHP error logs. This destination is configured in PHP under the error_log directive. Running php -i | grep error_log gives you the location.

The second part is actual troubleshooting. The first thing I did was to look at the log files. I hoped that an error would be present, and I was pleased with the result:

Fatal Error: 'Uncaught TypeError: Argument 1 passed to 
Catalog\PricingEngineIndexer\Model\ResourceModel\Queue::add() 
must be of the type string, int given

This error message is helpful; we can skate directly into Attempt a fix, right? All we have to do is cast an integer to a string, so this will be one of those lightning-fast fixes, and our client will be very impressed.

The identification was easy, but what if we don’t have a clear picture of what’s going on? You have two options. Even if you have a vague idea of what’s happening, try to fix it. Your fix will be limited, but it will force you to dig deeper into the problem/codebase.

You are a Detective

The fact that you’ve been notified of a problem means it has happened before—that’s a profound statement! Here’s how this becomes practical: we recently built a complex pricing engine for a website. There were a couple of times that the merchant reported an incorrect price. The easy “solution” was to kick off the pricing indexer, which recalculates the price (9/10 times, this fixes the problem). The problem goes away, and I let the client know that they can rest peacefully tonight.

Put bluntly, this was a terrible decision. I deleted the evidence for the problem, and by doing so, prevented the long-term solution.

Before you delete or modify anything from the database, make a backup. It could be a simple SQL dump of the affected table. On the other hand, it could be a full-blown dump. And, yes, it sounds cliché and overused to say “make a backup,” but it is critical.

Now that I have a dump, it’s an excellent time to run the pricing reindexer. If this fixes the problem, I must now troubleshoot “why the indexer didn’t automatically run when the product was updated.” We are on to a different issue, but this is now the problem that we must fix.

Attempt a Fix (Round 1)

Goal: Fix the Problem.

The time has arrived to add that type cast, create a plugin, or create a Composer patch.

Your first-version code fix might smell or not comply with PSR-1. You might even break the rules for your framework.

Don’t hesitate to change the core code for any framework if you believe there is a bug (I’m talking about ecommerce frameworks). Make sure to leave an easily searchable comment so you can revert it. If you are confident that you have located a bug, file a bug report or even submit a pull request to fix the problem. You will gain more confidence in your ability to troubleshoot, and you will be helping the others who use this same platform.

Of course, once you have a solution in place, don’t forget to come back through and clean up your construction debris and do things right. Revert your core code edits and replace them with a plugin or Composer patch.

The time cap is critical here. I have seen many developers, myself included, spend hours and hours trying to build the solution, only to find out the answer was completely different.

The problem we started to fix in Take an inventory, round 1, seems to be a simple matter of adding a type cast. So let’s do that and proceed.

Do It Again (Round 1)

Goal: Evaluate Where You are At

Back up and take a look. This might be taking a break or even a short walk. Get something to drink. Clear your mind. By stepping away from the problem, you will gain new focus.

Isn’t the type cast all we need to add? It’s possible, but let’s do some deeper thinking about the problem. The problem is casting an int to a string. Nine times out of ten, we would encounter this error, and we should be doing the opposite—casting a string to an int (the data wasn’t correctly type cast when loading from the database, for example).

The string to an int raises red flags. PHP can juggle and coerce types for us, but it doesn’t typically change or cast variables automatically. So how do we have numbers coming into our application? This method expects product SKU identifiers which are usually alphanumeric strings. A plausible theory is that the product’s primary keys (which are integers) are somehow coming into this method.

Do it again is valuable because it forces us to ask the question, “Is this solution the 100% solution?” In our case, we aren’t sure, so we are catapulted into another round of TAD. We will keep this round brief.

Take an Inventory (Round 2)

We need to identify if the product’s primary keys are flowing into this system. This search takes time because we need to reproduce the environment which triggered the problem. Thankfully, the merchant reported that it happened on an API request. I set a debugging breakpoint on the affected code and triggered the request through Postman.

It turns out that PHP does automatically cast strings to integers when used as array keys. But only for items that are genuinely integers. Strings are left as strings.

That was an interesting lesson, and I was pleased we didn’t have cross-contamination of primary keys and SKUs.

Remember, the first round of debugging can often resolve the symptoms, but the second round of debugging often prevents the symptoms from recurring. As we know, this makes our clients happy.

The Benefits of TAD

Aside from TAD’s approachable and friendly name, you will observe several positive outcomes:

  • By segmenting our time into a specific troubleshooting phase and solution phase, we get a faster perspective on identifying the problem and a possible solution. We change mental modes, which keeps us from getting stuck in a rut.
  • By cycling through a structured system, we reduce or eliminate the effect of tunnel vision. Instead of focusing on a perceived solution for hours on end, we iterate through troubleshooting and investigation to apply a quick solution.
  • We are giving time to ask ourselves whether or not this is the best solution for the problem. Round 1 may immediately solve the problem. Round 2 gives us the opportunity to dig in and ask, “Why did this happen?” We thus achieve a better solution which results in fewer calls from our client. We know how this affects us positively, too!

TAD is a habit that must be adopted. It will involve a slight change in your workflow, but I guarantee that you will find this change beneficial.

Who Benefits Most from TAD?

For those reading this article, the ones who will see the most benefit from TAD are those who fall into the junior and early-mid-level categories. If you are more senior, you will likely have identified enough pattern recognition of problems to solve many from sheer memory quickly. However, you will still find TAD helpful on those crazy-difficult-long-time-to-fix problems.

Try it on the next problem you face. You will be pleasantly surprised.

Additional Debugging Pointers

Start with the Easiest Solution

There have been quite a few occasions where I look at the problem and think, “this is an issue in the code.” After all, I’m a developer, so who can blame me?

I am sleeves-up, hard at work reviewing every line of the code execution process. My excitement level grows as I get closer to my target, where I know I’ll find the bug. Finally, my debugger steps into the get method that seems to hold the keys … and I find the code loads a value from the admin configuration. Glug. I dutifully navigated to the admin and saw that a configuration value controlled what seemed to be the errant behavior.

The easiest solution might be checking a third-party module’s version to see if an update is available. It could be that a file didn’t get committed to the source code repository.

Make It Easy and Safe to Obtain a Production Database

Safety is critical. But that usually comes at the cost of something else. And that’s usually ease or comfort. Hard hats aren’t the most enjoyable thing to wear for hours a day. Seat belts might feel restrictive. But I know we all love to enter that password on our computer every time we sit down to work, right?

We at SwiftOtter have invested a lot of time building an open-source project we call Driver. It is a tool to process and sanitize a production database. It can anonymize email addresses, names, and mailing addresses. It can transform the database into a staging-ready dump and local environments for that matter—like cleaning out admin users and replacing them with a locally-known admin user.

How does Driver work? It’s installed via Composer and configured with easy-to-understand YAML. The transformations happen in an Amazon RDS instance (so it doesn’t come anywhere near touching production). The final dumps are stored on S3. Run this on a schedule, and you have low-effort access to a sanitized production database.

Maybe we are biased, but Driver is the most powerful tool we’ve seen that does this.

What’s with Those Random Problems?

Could there be a more dreaded word in the debugging world than the term “random”? There seems to be no rhyme or reason.

Maybe the word of encouragement you need to hear is that there is no such thing as a random problem. Ok, you got me. The closest we can come to “random” are out-of-memory killers, where your OS chooses a random process for death to recover system resources. Or, maybe there is that application with business logic powered by the rand() function?

But in reality, almost everything has an explanation because all software was written by humans. We, as humans, are very capable of inserting unintentional but really bad bugs into our code. At the end of the day, it’s still able to be found.

Before Getting Started

Do your best to identify everything that could cause this problem. What steps could get to this point in time? What areas are involved? Where does this problem occur in the code?

Having this list written down will give you focus and context.

Next, let’s get to work. Our best friend for solving random problems are two things:

  1. code
  2. logging
Code Inspections

I put code inspections first because logging is useless until you locate where the problem might be happening. One of the best things to do is set a breakpoint in the general vicinity of the offending code and use your debugger to step through this line-by-line. Ask yourself, “What combination of variables would cause this behavior?”

Even if you don’t get your immediate answer, this will shine a searchlight on which areas would benefit from logging.

Logging

Logging identifies events that happen at a specific point in time. When used correctly, log entries can also identify the path of replicable instructions. Misused, you are no further in your troubleshooting than when you started. Thus, you must:

  1. Carefully think through what you are logging and articulate how this information will help you solve the problem.
  2. Be patient. Logging will only catch something once logging is in place. The event of concern must happen again to get the necessary information.

Meet the stack trace. It’s a verbose set of instructions that provides the list of frames that triggered to get to this point in the execution. We normally think of stack traces as available with exceptions, and they are. But, you can use this method to identify the stack trace from any method:

$this->logger->critical(
   “Important event occurred:”,
   [
      debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS),
      // … more context-related variables here
   ]
);

If you are using a PSR-3-compatible logging mechanism, take advantage of the $context variable and add anything relevant to help further debugging. This extra data would include current arguments. Be careful of logging entire objects as this will trash your log files; instead, use log arrays or scalar values. Also, ensure you’re not logging sensitive information like passwords, API keys, or secrets.

I recently fought through an issue where some values were not correctly attached to an invoice object. We created many invoices, and all of them had the proper values. I even created the invoice through the API, no fix.

My next hope was to add very detailed logging. When I say detailed, I mean literally writing $_SERVER to the log files. It was only upon doing this that I found this merchant’s third-party shipping system was calling an API endpoint of which I had never heard. Once I had this piece of information and was able to fix a core bug, the solution was found.

Logging was my escape hatch for this problem.

I should add that indiscriminate logging won’t bring you any closer to solving the problem. Be surgically precise with where you add logs. Oh, and stay on top of the bug until you get it fixed. Ensure you provide regular updates to your client, so they aren’t left wondering if this has you stumped, too.

A Final Piece of Advice

One of the most potent tools to solving problems is to leverage your peer group. However, do not do this until you have completed at least one round of the TAD framework. You will then have collected some level of knowledge. Knowledge is the power to eradicate bugs.

In Closing

The tips I shared will be helpful to you only if you cognizantly choose to alter your daily workflow. Granted, it will take effort and changing your routine. But I guarantee this will help you escape the tube of tunnel vision and allow you to spot those clues (Sherlock Holmes style) and get to the solution sooner.

You are a hero. Implementing the TAD framework will gain you even more recognition as a hero.

Biography

Joseph Maxwell is the CEO of SwiftOtter. His passion is helping developers build better solutions—and as part of that he is releasing a book titled The Art of Ecommerce Debugging (it’s a super entertaining and easy read that will help you become a lightning-fast debugger). He lives in Olathe, KS with his wife and three children. While not working or hanging out with his family, you’ll often find him biking. @josephmaxs

Tags:
 

Leave a comment

Use the form below to leave a comment: