This is the mail archive of the cygwin-apps mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

UTF32 crash in poco-1.6.0 - Request for help


I'm trying to package poco-1.6.0, but I've run into a problem where one of the tests in the Foundation test suite causes a seg fault. I've been trying to debug this on and off for three or four months (as I first saw this in one of the beta releases), but I can't seem to make much progress. This e-mail is to document my findings so far, and to request another maintainer take pity and help me find the cause of the crash.

The test causing the crash is the UnicodeConverterTest (Foundation/testsuite/src/UnicodeConverterTest.cpp). This is templated, and is instantiated with both UTF16 and UTF32 strings. The UTF16 case works correctly; it is the UTF32 instantiation that is crashing. Here's a rough trace:

Foundation/testsuite/src/UnicodeConverterTest.h line 49: This is in the templated runTests() procedure, instantiated with a UTF32String type. This calls Poco::UnicodeConverter::convert(), which attempts to convert a std::string into a UTF32 string. There is a mistake on the lines immediately following this (where the code confuses the size of a std::basic_string and the size of the type it is instantiated with), but this is a distraction - we never get this far. The call to convert() takes us to...

Foundation/src/UnicodeConverter.cpp line 39: This is one of server overloaded convert() functions that convert between different string types. In this instance, we're converting a UTF8 string into UTF32. The line I've drawn your attention to calls operator+=() on a std::basic_string instantiated with a 32-bit unsigned int. From hereon in, we're in the STL.

/usr/lib/gcc/i686-pc-cygwin/4.9.2/include/c++/bits/basic_string.h line 969: This is operator+= on a std::basic_string taking a single character. The function simply calls push_back() with the character supplied (in our case 0x41, i.e. 'A', not that this matters).

/usr/lib/gcc/i686-pc-cygwin/4.9.2/include/c++/bits/basic_string.h line 1073: This is the implementation of push_back(). There is insufficient capacity in the string, so the code calls reserve() to allocate enough memory for one additional character. In our case, this is one character, as the string was empty prior to this call. This is where I lose the thread - this call to reserve() is causing the seg fault.

Presumably, this call to reserve() is using the allocator to claim some heap memory. Poco's UTF32String doesn't specify a custom allocator (Foundation/include/Poco/UTFString.h line 287), so we're using the default std::allocator instantiated with the 32-bit unsigned int. However, std::allocator can't be the cause of the problem, otherwise every STL container would be crashing.

I attach the output of running strace over the 'UnicodeConverterTest' part of the test suite. I've compared this to another strace of a different (working) test, and essentially everything is the same down to line 374. I presume that these lines are common for any test and just concern the initialisation of the test runner. Line 375 shows the first significant change (set_signal_mask), and line 376 kills the process with a SIGABRT. The remainder of the strace output seems to be handling the SIGABRT.

Conspicuous by its absence is any call to alloc() or its friends. If the crash is occurring in basic_string::reserve() then I would expect to see some kind of memory manipulation going on at this point in the strace output. So the code must be aborting between entering basic_string::reserve() and std::allocator manipulating memory. However, we're in the internals of gcc's string implementation here, and it would need someone with a good understanding of this implementation to take the investigation further.

The traits look sensible (Foundation/include/Poco/UTFString.h line 148) and have a similarity with those proposed here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2035.pdf (page 5).

gdb doesn't give very much. A backtrace only shows Windows system DLLs (ntdll.dll, kernel32.dll) and reports that the stack may be corrupt. So I moved both the std::string and the wide string used in the test onto the heap, but that didn't make any difference.

Obviously, it would be trivial to make an example programme that just defines a UTF32 string in an identical way, and then calls operator+=() on it. I've done this, and it works fine. So there's more going on than that. I've also tried compiling poco-1.6.0 for native Windows, and the test runs fine.

I toyed with the idea that a different version of UTF32String (Foundation/include/Poco/UTFString.h starting at line 271) was being used in the main Poco Foundation DLL and the test harness, but I showed that this is not the case. In both the creation of the Foundation DLL and the test harness, the compiler is choosing the same definition of UTF32String.

Thank you for reading such a long write-up. I attach the cygport file and all the patches necessary to build poco-1.6.0. Any help you can provide would be very much appreciated.

Many thanks in advance,

Dave.

Attachment: poco.cygport
Description: Text document

Attachment: 1.4.6p1-unbundled.patch
Description: Text document

Attachment: 1.4.7-test-dequeue.patch
Description: Text document

Attachment: 1.5.3-data-odbc.patch
Description: Text document

Attachment: 1.6.0-pcre-unbundled.patch
Description: Text document

Attachment: 1.6.0-unicode-converter-test.patch
Description: Text document

Attachment: strace.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]