Wednesday, October 13, 2010

Grabbing the dumps from the customer's machine

Occasionally I encounter the task of assisting the customer to get the bump of the crashed process. My company develops software for marine navigation systems, and as you can guess there some problems with internet connection in open sea. In fact, most of the vessels have internet access aboard, but it’s very slow due to the high price of satellite connections. In addition, in most vessels you can’t take direct connection to bridge environment due to security and technical restrictions. So, the only communication remains is e-mail ?

To save my time and customers peace of mind I wrote the trivial batch script, which reduce the amount of explanations to minimum:

@echo off

rem The processes of interest
set PROCESSES=(IBSSvc TBService scserver71)

rem List all processes in the system
tlist /v > dumps/processes.txt

rem Grab all dumps and debugger output for the processes of interest
for %%i in %PROCESSES% do cdb -pv -pn "%%i.exe" -logo dumps/%%i.txt -c ".dump /ma dumps/%%i.dmp;q"


All tools you need for script execution you can find in “Debugging Tools for Windows” suite.

Here is the list of modules:

* cdb.exe
* dbgeng.dll
* dbghelp.dll            
* ext.dll
* tlist.exe
* uext.dll

After that, I just pack all those stuff to self-extracting archive and send it to customer. After customer unpacks the archive and executes the script, he sends me content of the “dumps” folder back and I have the opportunity to analyze it.

You can modify the script the way you like and use it where appropriate.

Monday, October 4, 2010

Believe not all that you see

Last week there was very interesting story that can be useful for debugger users. During the application debugging (using WinDbg) we encountered with stable repeated crash. The source of the crash was found easily, but it confused us completely. The problem was in one processor instruction. The instruction add eax,10h was ended up with subtraction of 34h from eax register.

We had checked it several times but the result was absolutely identical. After that, we made several experiments with the following results:

1.    Whatever we put into the place of 10h, the result was subtraction of 34h from eax register.

2.    After we moved this instruction three bytes above or bellow the current adress (just noping the gap where instruction was) the effect disappeared immediately. But when we returned it back in its palce, the effect returned as well.

After some time of puzzling over I looked into the breakpoints list and noticed that there was a breakpoint set at faulting instruction. But instruction wasn’t highlighted in the debugger. When I checked the address of the breakpoint, I found out that the breakpoint was set not inside the first instruction byte (like it should be), but inside the third byte – that is in the place of 10h operand. Of course this “breakpoint” didn’t act like an actual breakpoint, processor just executed add eax,CCh instruction instead. And because of debugger hides all its breakpoints before it’s pass the control to user, we always saw add eax,10h instruction instead of real one.

So, the obvious question is: Why the debugger set this breakpoint?

And the answer is simple. The unresolved breakpoint was set during another debugging session of this process, and since the process binaries and symbols set were changed, the debugger figured out wrong address for the breakpoint (in fact, completely wrong :)). And because of debugger still considered this breakpoint as unresolved, we were encountering with this crash over and over again after restart of the process.