You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Motti Shneor <Mo...@orbograph.com> on 2007/02/13 18:49:48 UTC

Need urgent fix or workaround: xerces 2.7.0 hangs (and bloats) while parsing/validating my XML against Schema

Hello everyone.

I use xerces 2.7.0 on windows platform. I have a rather big schema, but
it is not very complicated. I use DOM trees to represent the XML. Up
till now, xerces parsed and validated our documents without difficulty.
However, in the last days, parsing few special documents cause xerces to
enter some infinite recursion, hanging forever.  It hogs the CPU, and
bloats the memory of the process until it finally kills it.

This is intolerable --- our server simply hangs without even crashing,
when it encounters these XML messages.

I have narrowed down the parsing code, and the XML document to the
simplest case I can come out with that still manifests the hang.

Interesting note:  The document (and schema) are parsed and validated
without any special problems using other parsers (MSXML, C# internal
parser, FireFox browser, XML Notepad, Altova XMLSpy etc.). I only
experience problems trying to parse this using xerces within a c++
simple console application on windows.

Below are the parsing code, and 2 example XML documents - one shows the
problem while the other doesn't.

However, my schema is spread over several interconnected xsd files, and
I don't know how can I bring them to the list --- I tried directly as
text files, but if it doesn't come through, please instruct me how to
pass them to the list.

Thank you very much - Any hint here will be a lifesaver.

Here's the parsing code:

Int main()
{
  xercesc::XMLPlatformUtils::Initialize();
  try
  {
	std::string xmlPath = "Profile1.xml";
      xercesc::DOMImplementationLS* pImplementationLS =
static_cast<xercesc::DOMImplementationLS*>(
xercesc::DOMImplementationRegistry::getDOMImplementation(X("LS")));

      xercesc::DOMBuilder* pBuilder =
pImplementationLS->createDOMBuilder(
                  xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);

      pBuilder->setFeature(xercesc::XMLUni::fgDOMNamespaces, true);
      pBuilder->setFeature(xercesc::XMLUni::fgDOMValidation, true);
      pBuilder->setFeature(xercesc::XMLUni::fgXercesSchema, true);

      xercesc::DOMDocument* loadedDoc =
pBuilder->parseURI(X(xmlPath.c_str()));
      }
      catch (xercesc::XMLException& e)
      {
            tcerr << _T("Xerces XMLException: ") << e.getSrcFile() <<
_T("(")
                    << e.getSrcLine() << _T("): ") << e.getMessage() <<
endl;
      }
      catch (xercesc::DOMException& e)
      {
            tcerr << _T("Xerces DOMException ") << e.code;
            if (e.msg)
                  tcerr << _T(": ") << e.msg;
            tcerr << endl;
      }
      catch (std::exception& e)
      {
            cerr << "Exception: " << e.what() << endl;
      }
      catch (...)
      {
            tcerr << _T("Unknown error") << endl;
      }
      xercesc::XMLPlatformUtils::Terminate();
}

 

An XML document demonstrating the problem

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<orb:OrboAPI  Version="1.0.0a1"
xmlns:orb="http://www.orbograph.com/OrboSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.orbograph.com/OrboSchema
d:/myschema/OrboAPI.xsd">
  <orb:ProfileRetrieveResponse>
     <TaskID>1234</TaskID>
     <Error Severity="Info">0</Error>
     <Profile>
            <AccountID>0000000000027</AccountID>
            <BankName>keybankshort</BankName>
            <RoutingNumber>1</RoutingNumber>
            <AccountNumber>27</AccountNumber>
            <AccountStatus>Mature Account</AccountStatus>
            <AccountType>Business</AccountType>
    </Profile>
  </orb:ProfileRetrieveResponse>
</orb:OrboAPI>

An XML document parsing without problems, using same schema:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<orb:OrboAPI Version="1.0.0a1"
xmlns:orb="http://www.orbograph.com/OrboSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.orbograph.com/OrboSchema
d:/myschema/OrboAPI.xsd" AccountID="501-520575">
  <orb:FraudCheckRequest ExecutionType="Submit" ServiceType="Sereno">
    <DeadLine>
      <Duration>PT10H</Duration>
    </DeadLine>
    <Document>
	<MediaRef>0</MediaRef>
    </Document>
  </orb:FraudCheckRequest>
</orb:OrboAPI>

Motti Shneor
Senior Software Engineer
Orbograph Ltd.
motti.shneor@orbograph.com
http://www.orbograph.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Need urgent fix or workaround: xerces 2.7.0 hangs (and bloats) while parsing/validating my XML against Schema

Posted by Alberto Massari <am...@datadirect.com>.
Hi Motti,
check if the definition of one of the involved elements 
(orb:ProfileRetrieveResponse, TaskID, Error, Profile, AccountID, 
BankName, RoutingNumber, AccountNumber, AccountStatus, AccountType) 
contains a maxOccurs with a relatively high value (e.g. greater than 
100) and replace it with "unbounded"

Hope this helps,
Alberto

At 19.49 13/02/2007 +0200, Motti Shneor wrote:
>Hello everyone.
>
>I use xerces 2.7.0 on windows platform. I have a rather big schema, but
>it is not very complicated. I use DOM trees to represent the XML. Up
>till now, xerces parsed and validated our documents without difficulty.
>However, in the last days, parsing few special documents cause xerces to
>enter some infinite recursion, hanging forever.  It hogs the CPU, and
>bloats the memory of the process until it finally kills it.
>
>This is intolerable --- our server simply hangs without even crashing,
>when it encounters these XML messages.
>
>I have narrowed down the parsing code, and the XML document to the
>simplest case I can come out with that still manifests the hang.
>
>Interesting note:  The document (and schema) are parsed and validated
>without any special problems using other parsers (MSXML, C# internal
>parser, FireFox browser, XML Notepad, Altova XMLSpy etc.). I only
>experience problems trying to parse this using xerces within a c++
>simple console application on windows.
>
>Below are the parsing code, and 2 example XML documents - one shows the
>problem while the other doesn't.
>
>However, my schema is spread over several interconnected xsd files, and
>I don't know how can I bring them to the list --- I tried directly as
>text files, but if it doesn't come through, please instruct me how to
>pass them to the list.
>
>Thank you very much - Any hint here will be a lifesaver.
>
>Here's the parsing code:
>
>Int main()
>{
>   xercesc::XMLPlatformUtils::Initialize();
>   try
>   {
>         std::string xmlPath = "Profile1.xml";
>       xercesc::DOMImplementationLS* pImplementationLS =
>static_cast<xercesc::DOMImplementationLS*>(
>xercesc::DOMImplementationRegistry::getDOMImplementation(X("LS")));
>
>       xercesc::DOMBuilder* pBuilder =
>pImplementationLS->createDOMBuilder(
>                   xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0);
>
>       pBuilder->setFeature(xercesc::XMLUni::fgDOMNamespaces, true);
>       pBuilder->setFeature(xercesc::XMLUni::fgDOMValidation, true);
>       pBuilder->setFeature(xercesc::XMLUni::fgXercesSchema, true);
>
>       xercesc::DOMDocument* loadedDoc =
>pBuilder->parseURI(X(xmlPath.c_str()));
>       }
>       catch (xercesc::XMLException& e)
>       {
>             tcerr << _T("Xerces XMLException: ") << e.getSrcFile() <<
>_T("(")
>                     << e.getSrcLine() << _T("): ") << e.getMessage() <<
>endl;
>       }
>       catch (xercesc::DOMException& e)
>       {
>             tcerr << _T("Xerces DOMException ") << e.code;
>             if (e.msg)
>                   tcerr << _T(": ") << e.msg;
>             tcerr << endl;
>       }
>       catch (std::exception& e)
>       {
>             cerr << "Exception: " << e.what() << endl;
>       }
>       catch (...)
>       {
>             tcerr << _T("Unknown error") << endl;
>       }
>       xercesc::XMLPlatformUtils::Terminate();
>}
>
>
>
>An XML document demonstrating the problem
>
><?xml version="1.0" encoding="UTF-8" standalone="no"?>
>
><orb:OrboAPI  Version="1.0.0a1"
>xmlns:orb="http://www.orbograph.com/OrboSchema"
>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>xsi:schemaLocation="http://www.orbograph.com/OrboSchema
>d:/myschema/OrboAPI.xsd">
>   <orb:ProfileRetrieveResponse>
>      <TaskID>1234</TaskID>
>      <Error Severity="Info">0</Error>
>      <Profile>
>             <AccountID>0000000000027</AccountID>
>             <BankName>keybankshort</BankName>
>             <RoutingNumber>1</RoutingNumber>
>             <AccountNumber>27</AccountNumber>
>             <AccountStatus>Mature Account</AccountStatus>
>             <AccountType>Business</AccountType>
>     </Profile>
>   </orb:ProfileRetrieveResponse>
></orb:OrboAPI>
>
>An XML document parsing without problems, using same schema:
>
><?xml version="1.0" encoding="UTF-8" standalone="no" ?>
>
><orb:OrboAPI Version="1.0.0a1"
>xmlns:orb="http://www.orbograph.com/OrboSchema"
>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>xsi:schemaLocation="http://www.orbograph.com/OrboSchema
>d:/myschema/OrboAPI.xsd" AccountID="501-520575">
>   <orb:FraudCheckRequest ExecutionType="Submit" ServiceType="Sereno">
>     <DeadLine>
>       <Duration>PT10H</Duration>
>     </DeadLine>
>     <Document>
>         <MediaRef>0</MediaRef>
>     </Document>
>   </orb:FraudCheckRequest>
></orb:OrboAPI>
>
>Motti Shneor
>Senior Software Engineer
>Orbograph Ltd.
>motti.shneor@orbograph.com
>http://www.orbograph.com
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>For additional commands, e-mail: c-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org