Why Google Is Broken for Debugging
One of my greatest pains as a developer is getting no results from Google when searching for an exception message or stack trace. This is one of the few cases where I’m consistently unable to hone my query to glean the knowledge I seek from Google’s index. What gives?
It turns out, most of the time the googlebot sees an exception and stack trace comes along with a 404 or 500 status. Broken web apps in the wild are so common in fact, that Google flags 200 responses that include exceptions or stack traces in the body, and reclassifies them as “soft 404”. We first realized this was a problem for developers, when Google flagged many of the legitimate documentation pages on Omniref, and notified us of “crawler errors” through their webmaster tools. For example DropboxError, DynamoDB::FailureResponse, MemcacheLock and EM::Mongo::RequestResponse are all flagged. We’re not the first to have problems with this feature: Others have also reported misclassification.
Google won’t answer these queries, because it assumes people aren’t interested in bugs, but developers are different.
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it? – Brian Kernighan
To help developers search better when debugging, we’ve extended our distributed ruby compiler to parse every exception case, with it’s message, file and line number. When browsing Omniref docs, you’ll see this information wherever it’s relevant. For example:
You might’ve noticed that RDoc comment is slightly misleading. In the last example, it uses a loose documentation syntax that looks as if cattr_reader returns a NameError, when it actually raises one. We don’t think the answer for this problem is switching to YARD style tags (even though we support those too). We believe parsing and documenting the code itself is more valuable for everyone than asking developers to maintain explicit properties in a second comment language.
We’ve also rebuilt our index, so copying an exception message like “NameError invalid class attribute name: @BadName lib/active_support/core_ext/class/attribute_accessors.rb:35” will return results directly from the source, compared to nothing from google.
The most exceptional ruby gems
Since this new data also gives us hints about gem reliability, we’ll be baking that information into our index to improve search results in general. Here’s a brief rundown on the stats from the top 1000 most downloaded RubyGems:
There’s a strong correlation between Lines of Code and Total Exceptions Raised, with a Pearson product-moment coefficient of 0.75. It’s interesting to me that the correlation is linear, and stronger with larger projects. That’s the ruby standard library setting the bar in the top right. The larger a system, the more ways any single point could fail unexpectedly, so why isn’t there exponential growth in exceptions? Are exceptions being replaced with more scalable software architectures in large projects?
You can also see the bands that raise single digit exceptions toward the bottom. Are these programmers just not creative enough to imagine all the ways their software can fail? Is the domain so trivial that they’ve perfectly covered all the bases? Maybe they refuse to use exceptions on principle? We’ll have to dig deeper into the data to answer those questions, but I’m left wondering how RubyGems would do if subjected to the Black Team.
Another way to look at Exceptions per LoC shows surprising normality, except for the large clusters of outliers that don’t raise exceptions at all. The average RubyGem raises 7.5 exceptions per 1,000 LoC.
The correlation between exceptions raised and number of releases for a gem is weak, and can be completely explained by a third variable, lines of code. Total number of releases is identically correlated with LoC and Exceptions at 0.23. It appears it’s rare to release new versions that add any exception cases by themselves, e.g. from a small patch or bug report. It’s much more common to preemptively add exceptions when releasing new code. If you can predict a case, is it really that exceptional? Would it be better to return an error instead since Ruby is a dynamically typed language?
Personally, I’m of the school that exceptions should only be raised for truly exceptional behavior, never as a message passing mechanism, as the latter is essentially GOTO an unknown address. It’s an expensive operation for the machine. Worse, they’re expensive for developers to spend their time handling properly. Worst of all, in the unhandled case, the end user pays the ultimate price. More seasoned programmers than me, like Dijkstra and Spolsky are better reading on the topic.