You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Oliver, Steve" <St...@bestwestern.com> on 2004/08/25 16:39:52 UTC

Problems with multi-threaded parsing

I am using Xerces C++ 2.3 on HPUX. My application uses a thread pool
with each thread owning an instance of a parser. XML documents arrive in
a queue and are processed by one of the available parsers from the
thread pool. Under certain circumstances when I return from a parse my
heap is trashed. It all seems to start with a parser exception caused by
an invalid XML document. The exception is handled and reported and all
seems fine until another valid document is processed. It may be the very
next document or it may be several documents later but very soon after
the bad document is parsed the parsing of a subsequent good document
causes the corruption.

Here is a snippet of code that shows how each thread in the pool handles
the parsing...

    // Instantiate the DOM parser.
    static const XMLCh gLS[] = {chLatin_L, chLatin_S, chNull};
    DOMImplementation* impl =
DOMImplementationRegistry::getDOMImplementation(gLS);
    parser =
((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE
_SYNCHRONOUS, 0);

    parser->setFeature(XMLUni::fgDOMNamespaces, true);
    parser->setFeature(XMLUni::fgXercesSchema, true);
    parser->setFeature(XMLUni::fgXercesSchemaFullChecking, true);
    parser->setFeature(XMLUni::fgDOMValidation, true);
    parser->setFeature(XMLUni::fgDOMDatatypeNormalization, true);

    // Create error handler and install it
    parser->setErrorHandler(&errorHandler);

    // Create entity resolver and install it
    parser->setEntityResolver(&entityResolver);

    //reset error count
    errorHandler.resetErrors();

    for (;;)
    {
        // get xml document from queue

        xmlParser.resetDocumentPool();
        std::auto_ptr<MemBufInputSource> mem(new
MemBufInputSource(reinterpret_cast<const XMLByte*>(xml->buffer),
xml->size, xmlBufferID, false));
        std::auto_ptr<Wrapper4InputSource> src(new
Wrapper4InputSource(mem.get(), false));

        DOMDocument* doc = 0;
        errorHandler.resetErrors();

        try
        {
            checkHeap(); // All is fine here
            doc = parser->parse(*src);	
            checkHeap(); // Heap is trashed
        }

        catch (const XMLException& exception)
        {
            oss << "XML Exception:  " << StrX(exception.getMessage());
	throw ValidationException(oss.str());
        }

        catch (const DOMException& exception)
        {
	oss << "DOM Exception:  "  << exception.code;
	throw ValidationException(oss.str());
        }

        // other processing
    }

I've been looking at this for several days now and am at a loss. Has
anyone seen anything like this? Any thoughts, comments, suggestions, or
ideas will be greatly appreciated.

Steve