Thursday, December 23, 2010

Invalid Bus Driver

Several weeks ago we encountered with very interesting crash of one of our product’s processes. After analyzing the dump we found that exceptions were excited inside two threads simultaneously.

The first one roused up the debugger:

0355f0f0 7c3627e4 ntdll!RtlAllocateHeap+0x655
                  cmp edi,dword ptr [eax+4] ds:0023:8851e6cc=????????
0355f130 7c36280c msvcr71!_heap_alloc+0xe0
0355f138 7c362829 msvcr71!_nh_malloc+0x10
0355f144 7c3eb633 msvcr71!malloc+0xf
0355f154 7c3c1f0e msvcp71!operator new+0x21
0355f9cc 7c3c4f9e msvcp71!std::basic_string,std::allocator >::_Copy+0x73
0355f9e0 7c3c55df msvcp71!std::basic_string,std::allocator >::_Grow+0x22
0355f9fc 7c3c6752 msvcp71!std::basic_string,std::allocator >::assign+0x4e
0355fa10 00595239 msvcp71!std::basic_string,std::allocator >::basic_string,std::allocator >+0x20
0355fc4c 00595387 SCServer_dll71!_ConvertBinsToChars+0xb9
0355fdac 007335ea SCServer_dll71!TReadResourceHandler::HandleEvent+0x127
0355ff74 10046453 SCLib71!TReactor::Thread+0xda
0355ff80 7c36b381 ETL!TThread_::ThreadThunkFunction+0x23
0355ffb4 7c80b50b msvcr71!_threadstartex+0x6f
0355ffec 00000000 kernel32!BaseThreadStart+0x37

And the second one was pending:

0012e45c 7c90e9ab ntdll!KiFastSystemCallRet
0012e460 7c8633d5 ntdll!ZwWaitForMultipleObjects+0xc
0012e7a0 7c36e289 kernel32!UnhandledExceptionFilter+0x82d
0012e7bc 0040c860 msvcr71!_XcptFilter+0x15f
0012e7c8 7c363943 SCServer71!WinMainCRTStartup+0x1d7
0012e7f0 7c9037bf msvcr71!_except_handler3+0x61
0012e814 7c90378b ntdll!ExecuteHandler2+0x26
0012e8c4 7c90eafa ntdll!ExecuteHandler+0x24
0012e8c4 00409c1c ntdll!KiUserExceptionDispatcher+0xe
0012ebc4 7c1adc5b SCServer71!CMonitoringDlg::OnTimer+0x1c
                  call dword ptr [edx+8] ds:0023:00000008=????????              
0012ec54 7c1a9f01 mfc71!CWnd::OnWndMsg+0x46b
0012ec74 00422d16 mfc71!CWnd::WindowProc+0x22
0012ef74 77d487eb user32!InternalCallWinProc+0x28
0012efdc 77d489a5 user32!UserCallWinProcCheckWow+0x150
0012f03c 77d4bccc user32!DispatchMessageWorker+0x306
0012f04c 7c1b1645 user32!DispatchMessageA+0xf
0012f05c 7c1ab833 mfc71!AfxInternalPumpMessage+0x3e
0012f080 7c1aeeed mfc71!CWnd::RunModalLoop+0xca
0012f0bc 00424726 mfc71!CDialog::DoModal+0xf3
0012f0f0 0040142b NSGuiCtl10!CNSGDialog::DoModal+0xc6
0012ff08 7c1ae5d0 SCServer71!CScServerApp::InitInstance+0x9b
0012ff18 0040c80e mfc71!AfxWinMain+0x47
0012ffc0 7c816d4f SCServer71!WinMainCRTStartup+0x185
0012fff0 00000000 kernel32!BaseProcessStart+0x23

Wednesday, October 13, 2010

Grabbing the dumps from the customer's machine

Occasionally I encounter the task of assisting the customer to get the bump of the crashed process. My company develops software for marine navigation systems, and as you can guess there some problems with internet connection in open sea. In fact, most of the vessels have internet access aboard, but it’s very slow due to the high price of satellite connections. In addition, in most vessels you can’t take direct connection to bridge environment due to security and technical restrictions. So, the only communication remains is e-mail ?

To save my time and customers peace of mind I wrote the trivial batch script, which reduce the amount of explanations to minimum:

@echo off

rem The processes of interest
set PROCESSES=(IBSSvc TBService scserver71)

rem List all processes in the system
tlist /v > dumps/processes.txt

rem Grab all dumps and debugger output for the processes of interest
for %%i in %PROCESSES% do cdb -pv -pn "%%i.exe" -logo dumps/%%i.txt -c ".dump /ma dumps/%%i.dmp;q"

All tools you need for script execution you can find in “Debugging Tools for Windows” suite.

Here is the list of modules:

* cdb.exe
* dbgeng.dll
* dbghelp.dll            
* ext.dll
* tlist.exe
* uext.dll

After that, I just pack all those stuff to self-extracting archive and send it to customer. After customer unpacks the archive and executes the script, he sends me content of the “dumps” folder back and I have the opportunity to analyze it.

You can modify the script the way you like and use it where appropriate.

Monday, October 4, 2010

Believe not all that you see

Last week there was very interesting story that can be useful for debugger users. During the application debugging (using WinDbg) we encountered with stable repeated crash. The source of the crash was found easily, but it confused us completely. The problem was in one processor instruction. The instruction add eax,10h was ended up with subtraction of 34h from eax register.

We had checked it several times but the result was absolutely identical. After that, we made several experiments with the following results:

1.    Whatever we put into the place of 10h, the result was subtraction of 34h from eax register.

2.    After we moved this instruction three bytes above or bellow the current adress (just noping the gap where instruction was) the effect disappeared immediately. But when we returned it back in its palce, the effect returned as well.

After some time of puzzling over I looked into the breakpoints list and noticed that there was a breakpoint set at faulting instruction. But instruction wasn’t highlighted in the debugger. When I checked the address of the breakpoint, I found out that the breakpoint was set not inside the first instruction byte (like it should be), but inside the third byte – that is in the place of 10h operand. Of course this “breakpoint” didn’t act like an actual breakpoint, processor just executed add eax,CCh instruction instead. And because of debugger hides all its breakpoints before it’s pass the control to user, we always saw add eax,10h instruction instead of real one.

So, the obvious question is: Why the debugger set this breakpoint?

And the answer is simple. The unresolved breakpoint was set during another debugging session of this process, and since the process binaries and symbols set were changed, the debugger figured out wrong address for the breakpoint (in fact, completely wrong :)). And because of debugger still considered this breakpoint as unresolved, we were encountering with this crash over and over again after restart of the process.

Saturday, September 18, 2010

Congratulate me, I’m the winner!

Several articles from this blog have participated in Tell Your Windows Debugging Story Annual Competition. Few days ago Dmitry Vostokov introduced the winners of the competition. I’m glad to tell you that this blog is among them!

The way to find a hotspot

Today I want to describe the way to find a hotspot without profilers. I often use this approach and find it effective and simple enough. The only tools you need are any Windows debugger (I prefer WinDbg, but you can use your favorite one) which should always be in your hands if you decided to investigate some performance issue (as well as any other problem with your software) and tool like Process Explorer which available free from Windows Sysinternals. You can use this method without profiler at all or in conjunction with profiler to clarify the result of profilers work.

Before explaining the approach, I want to define the term “hotspot”. There are different definitions which can be applied in different contexts. In this article we consider a hotspot as a limited set of instructions on which execution spends significant CPU time or more CPU time than you expect from it.

The process is divided into the three steps. If you used a profiler and have a call graph, you can miss the first two steps and jump to step number three to verify the profilers result and find out the hotspot most precisely. Nevertheless, I recommend you to go through all steps because the results returned by profilers sometimes can be very obscure and you will lose time rolling around a call graph or other related information. Even if your profiler was absolutely right, you’ll have one more confirmation of its result. Sometimes it’s better to check twice :)

Thursday, September 9, 2010

The bug in MFC tool tip control

Yesterday, I encountered the bug in MFC’s tool tip control implementation.
The bug dozes in the following code:

LRESULT CToolTipCtrl::OnAddTool(WPARAM wParam, LPARAM lParam)
    TOOLINFO ti = *(LPTOOLINFO)lParam;     <----- HERE
    if ((ti.hinst == NULL) && (ti.lpszText != LPSTR_TEXTCALLBACK)
        && (ti.lpszText != NULL))
        void* pv;
        if (!m_mapString.Lookup(ti.lpszText, pv))
            m_mapString.SetAt(ti.lpszText, NULL);
        // set lpszText to point to the permanent memory associated
        // with the CString
        VERIFY(m_mapString.LookupKey(ti.lpszText, ti.lpszText));
    return DefWindowProc(TTM_ADDTOOL, wParam, (LPARAM)&ti);

Friday, September 3, 2010

To untangle a rope

Last weekend I was launching a kite with my daughter and her friend. It was great fun, but when I was veering the kite down its string entangled. I spent about 30 minutes (maybe even more – time flies fast :)) to untangle the string. I followed the string loops, analyzed the types of knots, imagined what will happen to string if I pass it through one loop or another. I went through a lot of different combinations inside my mind and devised more and more new techniques. When I finished finally, I felt satisfaction. I felt no tiredness or irritation, furthermore I had the feeling like I untangle the strings all my life. Then I thought about it I realized that it’s not far from the truth. The process of untangling string (as well as rope, chain and so on) is very similar with the process of software defects researching (the term was first introduced by Dmitry Vostokov). So if you in doubt to be software researcher or not – try to untangle a rope first :)