Monday, August 2, 2010

Why it's not crashing? (conclusion)

Well, today I received the final message from MS where they told me once again that there is nothing can be done on their side and advised me to catch exception in user code and call MiniDumpWriteDump after that. I think the story can be considered finished.

Se la vie, mes amis. Se la vie.

Sunday, August 1, 2010

In search of the Dr. Watson

A week ago I returned from the business trip to Italy. It was a trip to the yacht where was the real problems with the navigation system stability. One or two times a day all navigation data from sensors freeze and the navigation equipment was inoperative until full bridge (5 stations) rebooting.

Just before the trip I received some information concerning the problem from our service engineer.
There were several observations:

1.    All navigation data freeze simultaneously but the navigation system (radar, cartographic system and over subsystems) continued to work without any other noticeable problems.

2.    Sometimes the message box with “Pure virtual function call” notification was observed after data freezing.

3.    First time the problem was observed just after the last update of the product.

4.    In the end I even received HD images from all stations of the bridge (they was created after the last update and after the problem appeared for the first time).


Description of the problem was clearly pointed to the fact that the cause of the problem is crashing (or freezing) of the navigation server processes at all stations simultaneously (each computer has its own navigation server executed and described problem is only possible if there is no servers alive). The only thing that confused me was mentioned earlier “Pure virtual function call” error. This error is not very common and occurs rare enough in its pure form. Most common cause of the error is improper sequence of objects chain deinitialization and destruction as well as undigested strategy of objects ownership (of course I’m not take into account errors with calling pure virtual method of child class inside a parent class destructor or deleting an object twice – I believe that our code better :). One of the classic situations caused by assertion message box triggered during object destruction (usually in debug mode). Message box message loop can provoke a call (through a pure virtual interface) to a partially destroyed object. As a result we will have clear pure virtual function call.