C Program gets segmentation fault after main returns
Today I found quite an interesting bug at work. A Linux sample application was sometimes seg faulting on exit. However, when I started to investigate the bug I found that the seq fault occurred after the main function reached the return value. Now there are a number of possible causes for this I have come across in the past including other threads seg faulting, bugs in standard libraries and errors occurring when using GNU destructor attributes. However, in this case the stack trace look a bit different in gdb:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
Notice the function addresses... They are all 0x0.
To cut a long story short after a few hours of digging around I found the root cause: a dangling pointer. The pointer was handed into a function that took a pointer pointer as an argument. I can not show you the exact code because to copyright law. However, it was something like this:
char *Ptr;
foo(Handle, &Ptr);
Internally within the function the pointer was written to if it was not NULL and if it was a pointer value was assigned to it. Or, in other words if the argument was a buffer data was written to it otherwise it was given a pointer to an existing buffer and nothing was written. However, in this case because Ptr is not initialised it may be non-zero meaning foo would interpret it as a buffer and write to it. This explained why the problem was intermittent because sometime Ptr would be zero.
The seg fault appeared to occur because the data written to the pointer corrupted the call stack. This of course did not happen until the main function attempted to return. Such problems are difficult to track down because the debugger will not give you very enlightening information in the stack trace. I had to resort to moving the return around in the main function until I found the culprit.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
Notice the function addresses... They are all 0x0.
To cut a long story short after a few hours of digging around I found the root cause: a dangling pointer. The pointer was handed into a function that took a pointer pointer as an argument. I can not show you the exact code because to copyright law. However, it was something like this:
char *Ptr;
foo(Handle, &Ptr);
Internally within the function the pointer was written to if it was not NULL and if it was a pointer value was assigned to it. Or, in other words if the argument was a buffer data was written to it otherwise it was given a pointer to an existing buffer and nothing was written. However, in this case because Ptr is not initialised it may be non-zero meaning foo would interpret it as a buffer and write to it. This explained why the problem was intermittent because sometime Ptr would be zero.
The seg fault appeared to occur because the data written to the pointer corrupted the call stack. This of course did not happen until the main function attempted to return. Such problems are difficult to track down because the debugger will not give you very enlightening information in the stack trace. I had to resort to moving the return around in the main function until I found the culprit.
Comments