SteveJS asks about our favorite bug stores in Babble : Beeping Robots and Bug Stories.
This one may not be my favorite, but I did lose a week of my life to it. It was a java/weblogic web application. There was a minor problem in accepting data because one of the columns on the database was too small. So there was a sql script written to fix that and prepared. In testing, which was just a formaily because after-all, what could go wrong with a little sql column expansion? One of the testers reported that the thing was hanging on them after clicking around for a while. They couldn't replicate the problem in other environments without the patch.
And yes, their browser would be hanging, because the app itself was crashing. The whole java VM; coredump. But they couldn't supply an exact series of steps. I was able to replicate it very rarely. Very rarely (after hours of submitting loan apps, it would go).
Analyzing the core with GDB just returned the mangled c++ names of the java compiler, and the problem was in JITed code anyway. There was no way to tie that back to a Java method (this is why I -love- the new JDK 1.5 tools). Sun had us install the Sun debugger, but it didn't give us much more. I tried to set the values necessary to produce a java thread dump on shutdown, but it wasn't working.
After 4-5 of days of missteps and bad bug reporting, I finally found a reliable replication scenario. It was frustrating, because it wasn't something out of the ordinary that wasn't even used by the customers yet (a file upload feature), and wasn't supposed to be a test that the tester executed (and swore that they hand't been doing this, so we hadn't checked it). When you uploaded a file, the -next- application you submitted would cause the crash. The code didn't show anything totally obvious; it was a struts app and this was a commons-file upload. Looked okay at a first glance.
So with a replication scenario in-hand, I set to work getting a thread dump. I wrote shell scripts to kill -3 the VM (which generates a thread dump) in infinite loop, with varying delays, trying to get a snapshot at just the right time, but no luck.
So finally with everthing up and running locally with a debugger, I walked through the replication and had my root cause.
We had suspected this method before; it was a debugging call for a stupid log message that walked an object tree, printing out the values recursively. Of course, endless loops are a common VM source, so we had checked it carefully. But it wasn't endless; it checked to make sure it didn't revist identical references (don't ask why code like this was even allowed in just for debugging). So why did succeed when you hadn't just uploaded a file?
Turns out that when you walked the object tree (it was a struts form) normally, it was small and finished just fine. However, after uploading a file, one of the things the form now contained a reference too was the weblogic request...which contains a reference to the weblogic Mbean config root. Which contains some ridiculously large number of objects.
So after file uploading, this stupid debug statement would recursively call itself thousands of times ToString()ing a huge object graph, run out of stack, and crash the VM.
So bye bye a week of my life, but at least we were able to push the needed fix (lengthening a SQL column) to prod.
mmmm... Dump.dumpBeanRecursively(Object o) was truely a thing of beauty...
My favorite bug: port a convoluted stored proc INSERT to jdbc, then witness the resultant thousands of SQLExceptions, since, you know, that stored procedure silently truncated all those Strings to fit into the varchar(10) without telling you. And, yes, we did test this, just not with real-world data. Oh, the hilarity that ensued... priceless.
This is Rob Meyer's weblog, a weblog focused on software development and system administration based on 10 years of experience. Want to explore further? You can find out more me or see the rest of my website.
Wondering if I've written on something in particular? Try searching:
You might want to take a look at some of the more requested postings (as judged by incoming traffic):
Want more? Subscribe to this site
or contact me at rob at big dis dot com.
See my writings on: