Forget the $10,000,000 hunt for Bigfoot – here’s a very different search for an imaginary beastie that did pay off.
I’ve been troubleshooting a client’s Visual Basic .net program, and they’ve been reporting exceptions that abruptly close the program (or else provide the ‘send an error report to Microsoft’ dialog, which is just as serious). As you can imagine, I couldn’t reproduce the bug – always a good sign that the solution will be dead simple.
So I searched the Internet, and dutifully bolstered my source code as advised:
- Obviously, I already had
try/catchcode around everything – and sometimes it was caught. But (apparently) not always.
- Then I tried top level protection with
MyApplication_UnhandledExceptionin my ApplicationEvents.vb file (to get that, click on ‘View Application Events’ in the program’s Properties; Application tab). Even with some sweet error logging and program restart code I added, it still blew skyhigh (apparently).
I was about to add
AppDomain.CurrentDomain.UnhandledException handlers when it happened…
My test program blew up.
And despite the bulletproofing, it wasn’t caught, and the program didn’t fail gracefully – it aborted.
So it was time to REALLY search the web.
First thing I found out – a lot of MS professionals believe every exception can be caught with the right code (with the ‘exception’ of
Environment.FailFast of course, which is after all deliberately designed to bypass
try/catch/finally code and finalizers). Many posts starting with “I have an exception I can’t catch” ended unresolved and abruptly. The most belligerent replies (including one that said “it can’t happen” which I felt was rather omnipresent of him) had impressive MS credentials listed.
The second thing I found out was an obscure post – someone who not only had the problem, but could reproduce it – with code attached.
The gist of it is he used a C++ dll called by VB, and the dll executed this bit of code, which created an empty vector, then tried to display element 1 anyways:
#include <iostream> #include <vector> ...later on... vector<int> myValues; int a = myValues; cout << a << endl;
Volia – a blowup.
Fortunately for me, I had some dll/vb code, so I added these naughty bits to it – and got a blowup too.
More importantly, it was consistent – every time I called the code, the program was killed dead.
And armed with a consistently reproducible bug, I knew I could fix it.
…Oh, how naive I was those many hours ago…
It turns out that unmanaged code (ie non-.net code) in a dll can do some nasty things that will cause Windows to reach out and slap it – sometimes so hard the program is killed.
Now, according to the Internet, this should not be possible – unmanaged code bugs (SEH errors) are translated to .net errors as they percolate up the stack, and so can be caught by VB’s
try/catch code. And sometimes they are. For instance, if I do a divide by zero error in a dll, no problem – VB can catch my uncaught C++ exception, complete with a nice description. I may not have compete stack info on the error, but I can live with that.
But run this code, and nothing – I repeat nothing – catches it.
I tried the usual
try/catch code and
MyApplication_UnhandledException. I tried
AppDomain.CurrentDomain.UnhandledException. I even tried
in my app.config file. But of course, that’s only needed when moving to .net 4.0 or later – since I was in 2.0, it was a waste of time. I even went backward and tried
<legacyUnhandledExceptionPolicy enabled="true" /> in case .net 1.0/1.1 handling could solve it (it didn’t).
Of course, forum threads would suggest these things, and when they didn’t work, they went on to suggest programming the dll to include C++
try/catch code. First off, it doesn’t work (I tried); and second, not all of us have access to the dll’s source. And in case there’s any doubt, I tried both C++
try/catch and Structured Exception Handling (SEH) with
At about this point, most threads stop – having exhausted all the standard guesses, people quit offering help. And I read a LOT of threads like this.
Now in all fairness, I was one of those people – I firmly believed that the problem could not exist, and if it did, somehow it could be tamed by an exotic exception handler or some other code – so I’m both annoyed and delighted to point to a real example of the bug in the wild that is consistently bad – and that’s good. Reproducible implies fixable – or at least an end to wasted time saying “it doesn’t exist” and more time spent on trying to solve it.
But can it be fixed? I’m thinking not, but I’d be very happy to be proven wrong – the problem seems to be that sometimes specific parts of the stack are overwritten by an errant pointer, triggering the big bad Windows to crush the program in what is known as a Corrupted State Exception, or CSE. As well, it is a fine example of a Heisenbug, and changes as you test for it either under debug or release code – in fast, I had to compile my test C++ dll AND Visual Basic program under Release to get it to consistently blow up. So of course it’s a great bug – programmers don’t encounter it in their debug testing, but the release version shows it up real nice!
In my case, I’m now (somewhat) content to know it exists and is beyond the ability of this mortal VB programmer to fix – so I’ll try to bypass the dll code, and if necessary, set up a deadman program, a hidden program that monitors the first, such as by getting data piped to it regularly. When the data stops, the program is considered dead/hung, and is killed and restarted. A kludge definitely, but expecting MS to fix an obscure bug like this is unlikely – after all, like Bigfoot, no one believes you when you say you’ve seen it in the wild…