To Make Errors is Human, to Handle Them is Divine

Yaacov Apelbaum-We Have Bugs

Reading this advertisement made me realize how clever the software industry has become.  Why bother fixing your bugs before shipment when you can sell them on the premise that you will fix the bugs “free of charge” when the users find them for you? Interestingly, anyone who bothered to read their licensing guide will see the following sobering caveat:

“…From an engineering point of view, it is impossible to fix bugs in multiple source code branches. If we had to do this, we would never have been able to implement a major redesign. Major redesigns are required now and then to be able to fix bugs and add features fast.”

Nothing communicates your attitude towards your users better than how you handle exceptions and errors. When something goes wrong with your application, the user is in a heightened emotional state and is the most impressionable. Some software products, including leading market applications, have developed bad reputations for having cryptic error messages that are impossible to resolve, leaving the user feeling helpless and outraged.

The worst offenders include fortune-teller-style messages that inform you (not without irony) that you are about to lose all of your work because the application has encountered an unknown problem and needs to be shot down.

Yaacov Apelbaum-Useless Error Message
Image 1: A Useless Error Message

This is even more pronounced in the Internet’s session-less environment. It seems that we’ve been steadily regressing in how we treat our users when it comes to web application reliability and robustness.

Yaacov Apelbaum-Lost my Browser Yaacov Apelbaum-Stopped Working

Yaacov Apelbaum-Blogger Error Yaacov Apelbaum-Microsft Live Writer Error
Image 2: More Useless Error Messages

The Engineering Handling Failure
A civil engineer designing a bridge will invest a significant amount of time and resources in predicting potential structural failure scenarios. Failure analysis and safety factoring (i.e., redundancy) are two essential cornerstones of the engineering discipline. In the physical world of machines and structures, the ability to identify a potential design flaw and remedy it is a given. Similarly, we should strive to achieve this in the virtual software world by accounting for critical error conditions and developing robust application codes capable of handling those cases.

Software engineering does have specific nuances that differ from classical engineering, which makes prioritization of work more arbitrary and less straightforward. For example, the development team may consider a memory leak in a server component a critical bug. Still, a relatively small data validation problem that forces the user to retype a lengthy application could have a more significant user impact and rank higher on the bug fix priority.

A 12-Step Program for Error Rehabilitation
Making your application more agile in handling failure and enabling it to degrade gracefully is not a single-step process, and there is no silver bullet technology out there that will fix this problem.  If you want to break the cycle of application instability and user frustration, you will have to dedicate time and your best technical talent to solving it. I have found that a phased approach works best.  In this approach, you first handle the low-hanging fruits (addressing the mechanics of the error handling), then gradually move to higher ground (addressing automated problem resolution and preemptive countermeasures).

The following is my 4-phased program for solving your application errors. Classification is inclusive, so the 4th phase (the highest level of reliability) also includes the properties of the preceding levels:

Phase 1: Create Unique and Traceable Errors and a way to Record them
If you are under the gun and don’t have time for any other remedy, ensure your error cases are unique. Telling your users that an error has occurred in the application without providing details is a sign of an immature product. When your technical support team receives an error report, they should be able to determine precisely what is causing the problem.

Generic error handling (the same message for all errors) or different error causes that return identical messages are easy to implement, but they are useless when it comes to debugging. Unique error IDs allow us to track bugs more efficiently and translate them into a more stable product.

Error codes should be visible in the error messages but not be the focal point of the message.  You should develop a library of descriptive text that provides a human-readable explanation of what the error means.  Provide a simple mechanism to log the message directly into your app or send it to you via email.  Nothing is more annoying to the user than being asked to type in the error message manually.

Establish an Issue Tracking System that allows quick data entry and reporting. Record, at minimum, the error code, error description, steps to reproduce it, affected environments, and frequency.

Phase 2: Keep the User Calm and his Data Safe
Error messages should always carry a mature and responsible tone. Always use supportive, polite language, like a good teacher, when instructing a pupil.

If the user opts to leave a mandatory field empty or mistypes the data type (CC#, zip, etc.), don’t go ballistic. Non-critical errors deserve noncritical messages. Instead, indicate where the problem was on the entry form, place the cursor in the relevant field, and leave the rest of the data intact. This is especially important for long entry forms that require a lot of effort to complete.

Don’t force the user to duplicate the entry of some previously supplied data for verification purposes (such as billing and shipping information), as this may introduce human error and cause him to abandon the application altogether.

Phase-3: Good Errors Messages are Clear and Provide Remedies
The user perceives the error differently from you. He thinks in business terms and knows nothing about the inner workings of your application, nor does he care. That’s why you should always design the error UI from the user’s perspective.

Here are the seven golden attributes of error messages:

  1. Describe the error in user terms and language
  2. Instruct the user as to how to complete the task and resolve the error
  3. Explain how to prevent the problem in the future
  4. Avoid technical mumbo jumbo and acronyms
  5. Avoid modal pop-up error messages and instead write the error directly to the page
  6. Provide help links that better explain the nature of the error
  7. Keep the text formatting simple and avoid bright colors and animations

When providing a solution, give clear step-by-step instructions on how to fix the problem. Be specific and do not assume any previous user knowledge. If there is a relevant tutorial or specific solution in your online help, provide links directly there. If it’s a critical problem—for example, the Website is not accessible—provide a mechanism for the user to report the issue to you and immediately acknowledge the receipt of his complaint, provide an explanation and an estimate of time before this problem will be resolved.

Phase-4: Handle Errors Internally
Write code to handle all errors robustly. This will eliminate the most severe and standard errors (like missing data or validation). You can achieve this by automating data entry components from the user interaction (e.g., deriving the city name from the zip code).

To the greatest possible extent, corrective action must be taken before an error occurs. For example, if the user is in the middle of a lengthy entry form, save the contents as he moves between fields; this will allow you to restore the information if he inadvertently navigates off the page or even closes his browser session.

It’s often expensive to identify and address all possible failure cases, but if you have been tracking your top bugs, you can start with the biggest offenders first.

How you handle and communicate application errors directly reflects on your team’s and your company’s reputation. When building new functionality or reworking existing functionality, don’t assume that the old error messages apply to your new logic and boundaries. Building test cases around various error scenarios (missing data, wrong data, insufficient data, etc.) and dedicating a test cycle to generate all known error messages is also an excellent strategy.

Error handling and messages should be considered a required phase of any feature development, and adequate engineering time should be budgeted into all SDLC estimates.

Genuine quality of service goes beyond acknowledging your application’s faults. My rule of thumb is that there is no such thing as an “informative error message.” A good error has been eliminated through error-handling code and superior product design.

© Copyright 2010 Yaacov Apelbaum All Rights Reserved.

3 thoughts on “To Make Errors is Human, to Handle Them is Divine

  1. Pingback: SBannasch
  2. Pingback: Raj Sharma

Leave a Reply

Your email address will not be published. Required fields are marked *