You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-users@xerces.apache.org by neetha patil <ne...@gmail.com> on 2012/07/16 08:53:00 UTC

Why does Xerces modify an invalid XML file while parsing?

Dear All,

I am using Xercesc_2_8 C++. I provide a XML file (containing an invalid
tag) to the DOMBuilder parser. I then edit the DOM document which is
generated and save the document back to the XML file. The content of this
file is now truncated from the invalid tag onwards. Why does the parser
modify the file while parsing? How do I prevent the same? i.e., I want the
parser to report the error and continue parsing but not modify the XML
content.
Following is the snapshot of the XML file:-
...
...
<Header id="My Project Id" nameStructure="DevName" revision="0" version="1">
     ...
</Header>
     ...
     ...
<Services>
     ...
     ...
</Services>
<!-- Invalid tag: No node name -->
<name="abc">
...
...
 Following is the code snippet of the parser:-
*void CHelper::InitDOM()
*{
        // m_pDomImpl is a pointer to DOMImplementation
        m_pDomImpl = 0;
        if(m_pDomImpl == NULL)
        {
              XMLPlatformUtils::Initialize();
              m_pDomImpl = DOMImplementationRegistry::getDOMImplementation(
gLS );
         }
}

*int CHelper::LoadFile(DOMBuilder** pParser, const CString& strXMLFile,
DOMDocument** pDoc, CStringArray&     arrError, bool bValidate, const
CString& strSchemaFile)
*{
       ...
       if(*pParser == NULL)
       {
              *pParser =
((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder

                          (DOMImplementationLS::MODE_SYNCHRONOUS,
 0 );
               if((*pParser) ==NULL)
              {
                    return DOM_INITIALIZE_FAILED;
              }

              (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
              (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
              (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
true );
              (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
              (*pParser)->setFeature(
XMLUni::fgXercesCacheGrammarFromParse, true);
       }

       try
       {
              CMyDOMErrHandler eh();
              m_arrValidationErrs.RemoveAll();

              // parseURI a blocking call. All the errors will be reported
first if any error handler is set
              // then only the next line will be executed.
              if(bValidate == true)
             {
                   (*pParser)->setErrorHandler(&eh);
                   (*pParser)->loadGrammar( strSchemaFile,
Grammar::SchemaGrammarType, true);
             }
             else
             {
                    (*pParser)->setErrorHandler(NULL);
             }
             *pDoc =(*pParser)->parseURI(strXMLFile);
             ...
             ...
      }
      catch(...)
      {
            ...
      }

      return SUCCESS;

}

Thank you in advance.
Regards,
Neetha

Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

Posted by sh...@e-z.net.

Neetha,

Are you looking at the Parsed (DOM tree) instead of the source being
parsed?  The Parsed (DOM tree) is where the parsed content is recorded.

Steven J. Hathaway

> Dear Alberto,
>
> Thank you for the quick reply.
>
> As I do not load the grammar (schema) to the parser, it gives error like
> "Unknown element.." etc., for all the XML tags until it hits the invalid
> tag for which it gives the error 'Expected an attribute name' and aborts
> parsing as you mentioned.
>
> So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to true and
> got the complete file parsed. However the line containing the invalid tag
> was modified as follows:-
> ...
> ...
>  <Services>
>      ...
>      ...
> </Services>
> ...
> <name>
> ...
> ...
> </name>
> ...
> ...
>
> As it is told in http://xml.apache.org/xerces-c-new/program-dom.html that
> setting this feature to true might result in an *undetermined* behavior of
> the parser, is there any other way for the parser to report the error and
> continue parsing? Also can we prevent the auto-modification (in this case,
> the modification from <name="abc"> to <name>)?
>
> Thanks
>
> Regards,
> Neetha
>
> On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari <
> Alberto.Massari@progress.com> wrote:
>
>>  Hi,
>> Xerces doesn't modify your document; you should check the error handler
>> to
>> see if the parsing was aborted because of an error. In this case the
>> returned DOM tree would be complete up to position of the error.
>>
>> Alberto
>>
>> Il 16/07/2012 10:25, neetha patil ha scritto:
>>
>>  Dear All,
>>
>> I am using Xercesc_2_8 C++. I provide a XML file (containing an invalid
>> tag) to the
>> DOMBuilder parser. I then edit the DOM document which is generated and
>> save the document back to the XML file. The content of this file is now
>> truncated from the invalid tag onwards. Why does the parser modify the
>> file
>> while parsing? How do I prevent the same? i.e., I want the parser to
>> report
>> the error and continue parsing but not modify the XML content.
>> Following is the snapshot of the XML file:-
>> ...
>> ...
>> <Header id="My Project Id" nameStructure="DevName" revision="0"
>> version="1">
>>      ...
>> </Header>
>>      ...
>>      ...
>> <Services>
>>      ...
>>      ...
>> </Services>
>> <!-- Invalid tag: No node name -->
>> <name="abc">
>> ...
>> ...
>>  Following is the code snippet of the parser:-
>> *void CHelper::InitDOM()
>> *{
>>         // m_pDomImpl is a pointer to DOMImplementation
>>         m_pDomImpl = 0;
>>         if(m_pDomImpl == NULL)
>>         {
>>               XMLPlatformUtils::Initialize();
>>               m_pDomImpl =
>> DOMImplementationRegistry::getDOMImplementation( gLS );
>>          }
>> }
>>
>> *int CHelper::LoadFile(DOMBuilder** pParser, const CString& strXMLFile,
>> DOMDocument** pDoc, CStringArray&     arrError, bool bValidate, const
>> CString& strSchemaFile)
>> *{
>>        ...
>>        if(*pParser == NULL)
>>        {
>>               *pParser =
>> ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
>>                                                                                                  (DOMImplementationLS::MODE_SYNCHRONOUS,
>>  0 );
>>                if((*pParser) ==NULL)
>>               {
>>                     return DOM_INITIALIZE_FAILED;
>>               }
>>
>>               (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
>>               (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>>               (*pParser)->setFeature(
>> XMLUni::fgXercesSchemaFullChecking,
>> true );
>>               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>>               (*pParser)->setFeature(
>> XMLUni::fgXercesCacheGrammarFromParse, true);
>>        }
>>
>>        try
>>        {
>>               CMyDOMErrHandler eh();
>>               m_arrValidationErrs.RemoveAll();
>>
>>               // parseURI a blocking call. All the errors will be
>> reported
>> first if any error handler is set
>>               // then only the next line will be executed.
>>               if(bValidate == true)
>>              {
>>                    (*pParser)->setErrorHandler(&eh);
>>                    (*pParser)->loadGrammar( strSchemaFile,
>> Grammar::SchemaGrammarType, true);
>>              }
>>              else
>>              {
>>                     (*pParser)->setErrorHandler(NULL);
>>              }
>>              *pDoc =(*pParser)->parseURI(strXMLFile);
>>              ...
>>              ...
>>       }
>>       catch(...)
>>       {
>>             ...
>>       }
>>
>>       return SUCCESS;
>>
>> }
>>
>> Thank you in advance.
>> Regards,
>> Neetha
>>
>>
>>
>>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org

Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

Posted by neetha patil <ne...@gmail.com>.

Dear Alberto,

Thank you for your patience and for all the valuable information.

One final question: The link mentioned in my previous mail describes that
by setting 'XMLUni::fgXercesContinueAfterFatalError' to true, the parser's
behavior might  be *undetermined.* Is the auto-modification (which is being
discussed) one such behaviour? Also I would be grateful if you could
briefly explain other such behaviours.

Regards,
Neetha


On Wed, Jul 18, 2012 at 1:49 PM, Alberto Massari <
Alberto.Massari@progress.com> wrote:

>  Il 17/07/2012 08:21, neetha patil ha scritto:
>
> Dear All,
>
>  Thank you Alberto for guiding me to get rid of the "Unknown element"
> validation errors.
>
> I tried setting the parameter 'XMLUni::fgDOMErrorHandler' for the
> DOMBuilder parser but there it had no such parameter and also I am using
> the DOM document which is returned after parsing.
>
>
> I forgot that in the new DOM L3 the parameters are set through an
> intermediate object. The correct call should be something like
> (*pParser)->getDOMConfiguration()->setParameter. (Double check it, I could
> remember the name wrong, but that should give you the idea)
>
>
>
> DOMBuilder parser (while parsing against the schema) reports  the
> first schema-related error and continues with further parsing and reporting
> of other schema-related errors (if any). Is it possible for the DOMBuilderparser to behave in the same way (and not do any auto-modification) when
> there are invalid XML statement(s) like the one reported in my previous
> mail?
>
>
> No; validation errors are not fatal while invalid XML syntax could be
> non-recoverable. In your case the parser tries to find a new
> synchronization point at the first ">" it finds, but if you missed the
> closing quote at the end of an attribute you would be in much bigger
> troubles.
>
> What I am trying to make you understand is that an invalid XML cannot
> generate a DOM representation that reflects the input XML, because by
> serializing a DOM representation you will get a *valid* XML, not the
> original invalid one. The correct thing to do is reject the input XML you
> got; if you want to still be able to read and manipulate it, what you call
> "auto-modification" is the only thing you can do.
>
> Alberto
>
>
>
>
> Regards,
> Neetha
>
> On Mon, Jul 16, 2012 at 5:03 PM, Alberto Massari <
> Alberto.Massari@progress.com> wrote:
>
>>  Hi Neetha,
>> the correct thing to do would be to not make these calls
>>
>>
>>              (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>>               (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
>> true );
>>               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>>               (*pParser)->setFeature(
>> XMLUni::fgXercesCacheGrammarFromParse, true);
>>
>> when bValidate == false, as you are asking to validate against a schema
>> that you are not going to provide. This will remove the "Unknown element"
>> validation errors. As for what you say it's an "auto-modification", it's
>> the correct behaviour: <name="abc"> is not a valid XML statement (either
>> there is a missing tag name, and "name" is an attribute, or "name" is the
>> element and it's missing a space followed by the attribute name. If you
>> force the parser to continue, the DOM tree you get back will be incomplete,
>> at best.
>> If you really want to get a DOM tree out of that invalid XML, you could
>> attach a W3C DOMErrorHandler (different from the one you provided) using
>> (*pParser)->setParameter(XMLUni::fgDOMErrorHandler, domErrorHandlerVar)
>> This class has a handleError method where you can check what happened by
>> examining the DOMError argument, and the DOMLocation inside it (it contains
>> the DOM node where the error was located). If you return "true", the parser
>> will try continuing the parse process; if you return "false", parsing will
>> be aborted.
>>
>> Alberto
>>
>>
>> Il 16/07/2012 12:06, neetha patil ha scritto:
>>
>> Dear Alberto,
>>
>> Thank you for the quick reply.
>>
>> As I do not load the grammar (schema) to the parser, it gives error like
>> "Unknown element.." etc., for all the XML tags until it hits the invalid
>> tag for which it gives the error 'Expected an attribute name' and aborts
>> parsing as you mentioned.
>>
>> So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to true
>> and got the complete file parsed. However the line containing the invalid
>> tag was modified as follows:-
>> ...
>> ...
>>  <Services>
>>      ...
>>      ...
>> </Services>
>> ...
>> <name>
>> ...
>> ...
>> </name>
>> ...
>> ...
>>
>> As it is told in http://xml.apache.org/xerces-c-new/program-dom.html that
>> setting this feature to true might result in an *undetermined* behavior
>> of the parser, is there any other way for the parser to report the error
>> and continue parsing? Also can we prevent the auto-modification (in this
>> case, the modification from <name="abc"> to <name>)?
>>
>> Thanks
>>
>> Regards,
>> Neetha
>>
>> On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari <
>> Alberto.Massari@progress.com> wrote:
>>
>>>  Hi,
>>> Xerces doesn't modify your document; you should check the error handler
>>> to see if the parsing was aborted because of an error. In this case the
>>> returned DOM tree would be complete up to position of the error.
>>>
>>> Alberto
>>>
>>> Il 16/07/2012 10:25, neetha patil ha scritto:
>>>
>>>  Dear All,
>>>
>>> I am using Xercesc_2_8 C++. I provide a XML file (containing an invalid
>>> tag) to the
>>> DOMBuilder parser. I then edit the DOM document which is generated and
>>> save the document back to the XML file. The content of this file is now
>>> truncated from the invalid tag onwards. Why does the parser modify the file
>>> while parsing? How do I prevent the same? i.e., I want the parser to report
>>> the error and continue parsing but not modify the XML content.
>>> Following is the snapshot of the XML file:-
>>> ...
>>> ...
>>> <Header id="My Project Id" nameStructure="DevName" revision="0"
>>> version="1">
>>>      ...
>>> </Header>
>>>      ...
>>>      ...
>>> <Services>
>>>      ...
>>>      ...
>>> </Services>
>>> <!-- Invalid tag: No node name -->
>>> <name="abc">
>>> ...
>>> ...
>>>  Following is the code snippet of the parser:-
>>> *void CHelper::InitDOM()
>>> *{
>>>         // m_pDomImpl is a pointer to DOMImplementation
>>>         m_pDomImpl = 0;
>>>         if(m_pDomImpl == NULL)
>>>         {
>>>               XMLPlatformUtils::Initialize();
>>>               m_pDomImpl =
>>> DOMImplementationRegistry::getDOMImplementation( gLS );
>>>          }
>>> }
>>>
>>> *int CHelper::LoadFile(DOMBuilder** pParser, const CString& strXMLFile,
>>> DOMDocument** pDoc, CStringArray&     arrError, bool bValidate, const
>>> CString& strSchemaFile)
>>> *{
>>>        ...
>>>        if(*pParser == NULL)
>>>        {
>>>               *pParser =
>>> ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
>>>                                                                                                  (DOMImplementationLS::MODE_SYNCHRONOUS,
>>>  0 );
>>>                if((*pParser) ==NULL)
>>>               {
>>>                     return DOM_INITIALIZE_FAILED;
>>>               }
>>>
>>>               (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
>>>               (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>>>               (*pParser)->setFeature(
>>> XMLUni::fgXercesSchemaFullChecking, true );
>>>               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>>>               (*pParser)->setFeature(
>>> XMLUni::fgXercesCacheGrammarFromParse, true);
>>>        }
>>>
>>>        try
>>>        {
>>>               CMyDOMErrHandler eh();
>>>               m_arrValidationErrs.RemoveAll();
>>>
>>>               // parseURI a blocking call. All the errors will be
>>> reported first if any error handler is set
>>>               // then only the next line will be executed.
>>>               if(bValidate == true)
>>>              {
>>>                    (*pParser)->setErrorHandler(&eh);
>>>                    (*pParser)->loadGrammar( strSchemaFile,
>>> Grammar::SchemaGrammarType, true);
>>>              }
>>>              else
>>>              {
>>>                     (*pParser)->setErrorHandler(NULL);
>>>              }
>>>              *pDoc =(*pParser)->parseURI(strXMLFile);
>>>              ...
>>>              ...
>>>       }
>>>       catch(...)
>>>       {
>>>             ...
>>>       }
>>>
>>>       return SUCCESS;
>>>
>>> }
>>>
>>> Thank you in advance.
>>> Regards,
>>> Neetha
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>

Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

Posted by Alberto Massari <Al...@progress.com>.

Il 17/07/2012 08:21, neetha patil ha scritto:
> Dear All,
> Thank you Alberto for guiding me to get rid of the "Unknown element" 
> validation errors.
> I tried setting the parameter 'XMLUni::fgDOMErrorHandler' for the 
> DOMBuilderparser but there it had no such parameter and also I am 
> using the DOM document which is returned after parsing.

I forgot that in the new DOM L3 the parameters are set through an 
intermediate object. The correct call should be something like 
(*pParser)->getDOMConfiguration()->setParameter. (Double check it, I 
could remember the name wrong, but that should give you the idea)

> DOMBuilderparser (while parsing against the schema) reports  the 
> first schema-related error and continues with further parsing and 
> reporting of other schema-related errors (if any). Is it possible for 
> the DOMBuilderparser to behave in the same way (and not do any 
> auto-modification) when there are invalid XML statement(s) like the 
> one reported in my previous mail?

No; validation errors are not fatal while invalid XML syntax could be 
non-recoverable. In your case the parser tries to find a new 
synchronization point at the first ">" it finds, but if you missed the 
closing quote at the end of an attribute you would be in much bigger 
troubles.

What I am trying to make you understand is that an invalid XML cannot 
generate a DOM representation that reflects the input XML, because by 
serializing a DOM representation you will get a *valid* XML, not the 
original invalid one. The correct thing to do is reject the input XML 
you got; if you want to still be able to read and manipulate it, what 
you call "auto-modification" is the only thing you can do.

Alberto


> Regards,
> Neetha
>
> On Mon, Jul 16, 2012 at 5:03 PM, Alberto Massari 
> <Alberto.Massari@progress.com <ma...@progress.com>> 
> wrote:
>
>     Hi Neetha,
>     the correct thing to do would be to not make these calls
>
>
>     (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>                   (*pParser)->setFeature(
>     XMLUni::fgXercesSchemaFullChecking, true );
>                   (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>                   (*pParser)->setFeature(
>     XMLUni::fgXercesCacheGrammarFromParse, true);
>
>     when bValidate == false, as you are asking to validate against a
>     schema that you are not going to provide. This will remove the
>     "Unknown element" validation errors. As for what you say it's an
>     "auto-modification", it's the correct behaviour: <name="abc"> is
>     not a valid XML statement (either there is a missing tag name, and
>     "name" is an attribute, or "name" is the element and it's missing
>     a space followed by the attribute name. If you force the parser to
>     continue, the DOM tree you get back will be incomplete, at best.
>     If you really want to get a DOM tree out of that invalid XML, you
>     could attach a W3C DOMErrorHandler (different from the one you
>     provided) using
>     (*pParser)->setParameter(XMLUni::fgDOMErrorHandler,
>     domErrorHandlerVar)
>     This class has a handleError method where you can check what
>     happened by examining the DOMError argument, and the DOMLocation
>     inside it (it contains the DOM node where the error was located).
>     If you return "true", the parser will try continuing the parse
>     process; if you return "false", parsing will be aborted.
>
>     Alberto
>
>
>     Il 16/07/2012 12:06, neetha patil ha scritto:
>>     Dear Alberto,
>>     Thank you for the quick reply.
>>     As I do not load the grammar (schema) to the parser, it gives
>>     error like "Unknown element.." etc., for all the XML tags until
>>     it hits the invalid tag for which it gives the error 'Expected an
>>     attribute name' and aborts parsing as you mentioned.
>>     So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to
>>     true and got the complete file parsed. However the line
>>     containing the invalid tag was modified as follows:-
>>     ...
>>     ...
>>     <Services>
>>          ...
>>          ...
>>     </Services>
>>     ...
>>     <name>
>>     ...
>>     ...
>>     </name>
>>     ...
>>     ...
>>     As it is told in
>>     http://xml.apache.org/xerces-c-new/program-dom.html that setting
>>     this feature to true might result in an *undetermined* behavior
>>     of the parser, is there any other way for the parser to report
>>     the error and continue parsing? Also can we prevent the
>>     auto-modification (in this case, the modification from
>>     <name="abc"> to <name>)?
>>     Thanks
>>     Regards,
>>     Neetha
>>
>>     On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari
>>     <Alberto.Massari@progress.com
>>     <ma...@progress.com>> wrote:
>>
>>         Hi,
>>         Xerces doesn't modify your document; you should check the
>>         error handler to see if the parsing was aborted because of an
>>         error. In this case the returned DOM tree would be complete
>>         up to position of the error.
>>
>>         Alberto
>>
>>         Il 16/07/2012 10:25, neetha patil ha scritto:
>>>         Dear All,
>>>
>>>         I am using Xercesc_2_8 C++. I provide a XML file (containing
>>>         an invalid tag) to the
>>>
>>>         DOMBuilderparser. I then edit the DOM document which is
>>>         generated and save the document back to the XML file. The
>>>         content of this file is now truncated from the invalid tag
>>>         onwards. Why does the parser modify the file while parsing?
>>>         How do I prevent the same? i.e., I want the parser to report
>>>         the error and continue parsing but not modify the XML content.
>>>         Following is the snapshot of the XML file:-
>>>         ...
>>>         ...
>>>         <Header id="My Project Id" nameStructure="DevName"
>>>         revision="0" version="1">
>>>         ...
>>>         </Header>
>>>              ...
>>>              ...
>>>         <Services>
>>>         ...
>>>         ...
>>>         </Services>
>>>         <!-- Invalid tag: No node name -->
>>>         <name="abc">
>>>         ...
>>>         ...
>>>         　
>>>         Following is the code snippet of the parser:-
>>>         *void CHelper::InitDOM()
>>>         *{
>>>                 // m_pDomImpl is a pointer to DOMImplementation
>>>                 m_pDomImpl = 0;
>>>                 if(m_pDomImpl == NULL)
>>>         {
>>>         XMLPlatformUtils::Initialize();
>>>                       m_pDomImpl =
>>>         DOMImplementationRegistry::getDOMImplementation( gLS );
>>>                  }
>>>         }
>>>         *int CHelper::LoadFile(DOMBuilder** pParser, const CString&
>>>         strXMLFile, DOMDocument** pDoc, CStringArray&     arrError,
>>>         bool bValidate, const CString& strSchemaFile)
>>>         *{
>>>                ...
>>>                if(*pParser == NULL)
>>>                {
>>>                       *pParser =
>>>         ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
>>>                                                                                                          (DOMImplementationLS::MODE_SYNCHRONOUS,
>>>          0 );
>>>                        if((*pParser) ==NULL)
>>>                       {
>>>                             return DOM_INITIALIZE_FAILED;
>>>                       }
>>>
>>>         (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
>>>         (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>>>         (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
>>>         true );
>>>         (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>>>         (*pParser)->setFeature(
>>>         XMLUni::fgXercesCacheGrammarFromParse, true);
>>>                }
>>>
>>>                try
>>>                {
>>>                       CMyDOMErrHandler eh();
>>>         m_arrValidationErrs.RemoveAll();
>>>
>>>                       // parseURI a blocking call. All the errors
>>>         will be reported first if any error handler is set
>>>                       // then only the next line will be executed.
>>>                       if(bValidate == true)
>>>                      {
>>>                            (*pParser)->setErrorHandler(&eh);
>>>                            (*pParser)->loadGrammar( strSchemaFile,
>>>         Grammar::SchemaGrammarType, true);
>>>                      }
>>>                      else
>>>                      {
>>>         (*pParser)->setErrorHandler(NULL);
>>>                      }
>>>                      *pDoc =(*pParser)->parseURI(strXMLFile);
>>>                      ...
>>>         ...
>>>               }
>>>               catch(...)
>>>               {
>>>                     ...
>>>               }
>>>
>>>               return SUCCESS;
>>>
>>>         }
>>>
>>>         Thank you in advance.
>>>
>>>         Regards,
>>>         Neetha
>>>
>>
>>
>>
>
>
>

Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

Posted by neetha patil <ne...@gmail.com>.

Dear All,

Thank you Alberto for guiding me to get rid of the "Unknown element"
validation errors.

I tried setting the parameter 'XMLUni::fgDOMErrorHandler' for the
DOMBuilderparser but there it had no such parameter and also I am
using the DOM
document which is returned after parsing.

DOMBuilder parser (while parsing against the schema) reports  the
first schema-related error and continues with further parsing and reporting
of other schema-related errors (if any). Is it possible for the
DOMBuilderparser to behave in the same way (and not do any
auto-modification) when
there are invalid XML statement(s) like the one reported in my previous
mail?

Regards,
Neetha

On Mon, Jul 16, 2012 at 5:03 PM, Alberto Massari <
Alberto.Massari@progress.com> wrote:

>  Hi Neetha,
> the correct thing to do would be to not make these calls
>
>
>              (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>               (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
> true );
>               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>               (*pParser)->setFeature(
> XMLUni::fgXercesCacheGrammarFromParse, true);
>
> when bValidate == false, as you are asking to validate against a schema
> that you are not going to provide. This will remove the "Unknown element"
> validation errors. As for what you say it's an "auto-modification", it's
> the correct behaviour: <name="abc"> is not a valid XML statement (either
> there is a missing tag name, and "name" is an attribute, or "name" is the
> element and it's missing a space followed by the attribute name. If you
> force the parser to continue, the DOM tree you get back will be incomplete,
> at best.
> If you really want to get a DOM tree out of that invalid XML, you could
> attach a W3C DOMErrorHandler (different from the one you provided) using
> (*pParser)->setParameter(XMLUni::fgDOMErrorHandler, domErrorHandlerVar)
> This class has a handleError method where you can check what happened by
> examining the DOMError argument, and the DOMLocation inside it (it contains
> the DOM node where the error was located). If you return "true", the parser
> will try continuing the parse process; if you return "false", parsing will
> be aborted.
>
> Alberto
>
>
> Il 16/07/2012 12:06, neetha patil ha scritto:
>
> Dear Alberto,
>
> Thank you for the quick reply.
>
> As I do not load the grammar (schema) to the parser, it gives error like
> "Unknown element.." etc., for all the XML tags until it hits the invalid
> tag for which it gives the error 'Expected an attribute name' and aborts
> parsing as you mentioned.
>
> So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to true
> and got the complete file parsed. However the line containing the invalid
> tag was modified as follows:-
> ...
> ...
>  <Services>
>      ...
>      ...
> </Services>
> ...
> <name>
> ...
> ...
> </name>
> ...
> ...
>
> As it is told in http://xml.apache.org/xerces-c-new/program-dom.html that
> setting this feature to true might result in an *undetermined* behavior
> of the parser, is there any other way for the parser to report the error
> and continue parsing? Also can we prevent the auto-modification (in this
> case, the modification from <name="abc"> to <name>)?
>
> Thanks
>
> Regards,
> Neetha
>
> On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari <
> Alberto.Massari@progress.com> wrote:
>
>>  Hi,
>> Xerces doesn't modify your document; you should check the error handler
>> to see if the parsing was aborted because of an error. In this case the
>> returned DOM tree would be complete up to position of the error.
>>
>> Alberto
>>
>> Il 16/07/2012 10:25, neetha patil ha scritto:
>>
>>  Dear All,
>>
>> I am using Xercesc_2_8 C++. I provide a XML file (containing an invalid
>> tag) to the
>> DOMBuilder parser. I then edit the DOM document which is generated and
>> save the document back to the XML file. The content of this file is now
>> truncated from the invalid tag onwards. Why does the parser modify the file
>> while parsing? How do I prevent the same? i.e., I want the parser to report
>> the error and continue parsing but not modify the XML content.
>> Following is the snapshot of the XML file:-
>> ...
>> ...
>> <Header id="My Project Id" nameStructure="DevName" revision="0"
>> version="1">
>>      ...
>> </Header>
>>      ...
>>      ...
>> <Services>
>>      ...
>>      ...
>> </Services>
>> <!-- Invalid tag: No node name -->
>> <name="abc">
>> ...
>> ...
>>  Following is the code snippet of the parser:-
>> *void CHelper::InitDOM()
>> *{
>>         // m_pDomImpl is a pointer to DOMImplementation
>>         m_pDomImpl = 0;
>>         if(m_pDomImpl == NULL)
>>         {
>>               XMLPlatformUtils::Initialize();
>>               m_pDomImpl =
>> DOMImplementationRegistry::getDOMImplementation( gLS );
>>          }
>> }
>>
>> *int CHelper::LoadFile(DOMBuilder** pParser, const CString& strXMLFile,
>> DOMDocument** pDoc, CStringArray&     arrError, bool bValidate, const
>> CString& strSchemaFile)
>> *{
>>        ...
>>        if(*pParser == NULL)
>>        {
>>               *pParser =
>> ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
>>                                                                                                  (DOMImplementationLS::MODE_SYNCHRONOUS,
>>  0 );
>>                if((*pParser) ==NULL)
>>               {
>>                     return DOM_INITIALIZE_FAILED;
>>               }
>>
>>               (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
>>               (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>>               (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
>> true );
>>               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>>               (*pParser)->setFeature(
>> XMLUni::fgXercesCacheGrammarFromParse, true);
>>        }
>>
>>        try
>>        {
>>               CMyDOMErrHandler eh();
>>               m_arrValidationErrs.RemoveAll();
>>
>>               // parseURI a blocking call. All the errors will be
>> reported first if any error handler is set
>>               // then only the next line will be executed.
>>               if(bValidate == true)
>>              {
>>                    (*pParser)->setErrorHandler(&eh);
>>                    (*pParser)->loadGrammar( strSchemaFile,
>> Grammar::SchemaGrammarType, true);
>>              }
>>              else
>>              {
>>                     (*pParser)->setErrorHandler(NULL);
>>              }
>>              *pDoc =(*pParser)->parseURI(strXMLFile);
>>              ...
>>              ...
>>       }
>>       catch(...)
>>       {
>>             ...
>>       }
>>
>>       return SUCCESS;
>>
>> }
>>
>> Thank you in advance.
>> Regards,
>> Neetha
>>
>>
>>
>>
>
>
>

Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

Posted by Alberto Massari <Al...@progress.com>.

Hi Neetha,
the correct thing to do would be to not make these calls

              (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
               (*pParser)->setFeature( 
XMLUni::fgXercesSchemaFullChecking, true );
               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
               (*pParser)->setFeature( 
XMLUni::fgXercesCacheGrammarFromParse, true);

when bValidate == false, as you are asking to validate against a schema 
that you are not going to provide. This will remove the "Unknown 
element" validation errors. As for what you say it's an 
"auto-modification", it's the correct behaviour: <name="abc"> is not a 
valid XML statement (either there is a missing tag name, and "name" is 
an attribute, or "name" is the element and it's missing a space followed 
by the attribute name. If you force the parser to continue, the DOM tree 
you get back will be incomplete, at best.
If you really want to get a DOM tree out of that invalid XML, you could 
attach a W3C DOMErrorHandler (different from the one you provided) using 
(*pParser)->setParameter(XMLUni::fgDOMErrorHandler, domErrorHandlerVar)
This class has a handleError method where you can check what happened by 
examining the DOMError argument, and the DOMLocation inside it (it 
contains the DOM node where the error was located). If you return 
"true", the parser will try continuing the parse process; if you return 
"false", parsing will be aborted.

Alberto


Il 16/07/2012 12:06, neetha patil ha scritto:
> Dear Alberto,
> Thank you for the quick reply.
> As I do not load the grammar (schema) to the parser, it gives error 
> like "Unknown element.." etc., for all the XML tags until it hits the 
> invalid tag for which it gives the error 'Expected an attribute name' 
> and aborts parsing as you mentioned.
> So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to true 
> and got the complete file parsed. However the line containing the 
> invalid tag was modified as follows:-
> ...
> ...
> <Services>
>      ...
>      ...
> </Services>
> ...
> <name>
> ...
> ...
> </name>
> ...
> ...
> As it is told in http://xml.apache.org/xerces-c-new/program-dom.html 
> that setting this feature to true might result in an *undetermined* 
> behavior of the parser, is there any other way for the parser to 
> report the error and continue parsing? Also can we prevent the 
> auto-modification (in this case, the modification from <name="abc"> to 
> <name>)?
> Thanks
> Regards,
> Neetha
>
> On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari 
> <Alberto.Massari@progress.com <ma...@progress.com>> 
> wrote:
>
>     Hi,
>     Xerces doesn't modify your document; you should check the error
>     handler to see if the parsing was aborted because of an error. In
>     this case the returned DOM tree would be complete up to position
>     of the error.
>
>     Alberto
>
>     Il 16/07/2012 10:25, neetha patil ha scritto:
>>     Dear All,
>>
>>     I am using Xercesc_2_8 C++. I provide a XML file (containing an
>>     invalid tag) to the
>>
>>     DOMBuilderparser. I then edit the DOM document which is generated
>>     and save the document back to the XML file. The content of this
>>     file is now truncated from the invalid tag onwards. Why does the
>>     parser modify the file while parsing? How do I prevent the same?
>>     i.e., I want the parser to report the error and continue parsing
>>     but not modify the XML content.
>>     Following is the snapshot of the XML file:-
>>     ...
>>     ...
>>     <Header id="My Project Id" nameStructure="DevName" revision="0"
>>     version="1">
>>          ...
>>     </Header>
>>          ...
>>          ...
>>     <Services>
>>          ...
>>          ...
>>     </Services>
>>     <!-- Invalid tag: No node name -->
>>     <name="abc">
>>     ...
>>     ...
>>     　
>>     Following is the code snippet of the parser:-
>>     *void CHelper::InitDOM()
>>     *{
>>             // m_pDomImpl is a pointer to DOMImplementation
>>             m_pDomImpl = 0;
>>             if(m_pDomImpl == NULL)
>>             {
>>                   XMLPlatformUtils::Initialize();
>>                   m_pDomImpl =
>>     DOMImplementationRegistry::getDOMImplementation( gLS );
>>              }
>>     }
>>     *int CHelper::LoadFile(DOMBuilder** pParser, const CString&
>>     strXMLFile, DOMDocument** pDoc, CStringArray&     arrError, bool
>>     bValidate, const CString& strSchemaFile)
>>     *{
>>            ...
>>            if(*pParser == NULL)
>>            {
>>                   *pParser =
>>     ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
>>                                                                                                      (DOMImplementationLS::MODE_SYNCHRONOUS,
>>      0 );
>>                    if((*pParser) ==NULL)
>>                   {
>>                         return DOM_INITIALIZE_FAILED;
>>                   }
>>
>>                   (*pParser)->setFeature( XMLUni::fgDOMNamespaces,
>>     true );
>>                   (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>>                   (*pParser)->setFeature(
>>     XMLUni::fgXercesSchemaFullChecking, true );
>>                   (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>>                   (*pParser)->setFeature(
>>     XMLUni::fgXercesCacheGrammarFromParse, true);
>>            }
>>
>>            try
>>            {
>>                   CMyDOMErrHandler eh();
>>                   m_arrValidationErrs.RemoveAll();
>>
>>                   // parseURI a blocking call. All the errors will be
>>     reported first if any error handler is set
>>                   // then only the next line will be executed.
>>                   if(bValidate == true)
>>                  {
>>                        (*pParser)->setErrorHandler(&eh);
>>                        (*pParser)->loadGrammar( strSchemaFile,
>>     Grammar::SchemaGrammarType, true);
>>                  }
>>                  else
>>                  {
>>     (*pParser)->setErrorHandler(NULL);
>>                  }
>>                  *pDoc =(*pParser)->parseURI(strXMLFile);
>>                  ...
>>                  ...
>>           }
>>           catch(...)
>>           {
>>                 ...
>>           }
>>
>>           return SUCCESS;
>>
>>     }
>>
>>     Thank you in advance.
>>
>>     Regards,
>>     Neetha
>>
>
>
>

Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

Posted by neetha patil <ne...@gmail.com>.

Dear Alberto,

Thank you for the quick reply.

As I do not load the grammar (schema) to the parser, it gives error like
"Unknown element.." etc., for all the XML tags until it hits the invalid
tag for which it gives the error 'Expected an attribute name' and aborts
parsing as you mentioned.

So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to true and
got the complete file parsed. However the line containing the invalid tag
was modified as follows:-
...
...
 <Services>
     ...
     ...
</Services>
...
<name>
...
...
</name>
...
...

As it is told in http://xml.apache.org/xerces-c-new/program-dom.html that
setting this feature to true might result in an *undetermined* behavior of
the parser, is there any other way for the parser to report the error and
continue parsing? Also can we prevent the auto-modification (in this case,
the modification from <name="abc"> to <name>)?

Thanks

Regards,
Neetha

On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari <
Alberto.Massari@progress.com> wrote:

>  Hi,
> Xerces doesn't modify your document; you should check the error handler to
> see if the parsing was aborted because of an error. In this case the
> returned DOM tree would be complete up to position of the error.
>
> Alberto
>
> Il 16/07/2012 10:25, neetha patil ha scritto:
>
>  Dear All,
>
> I am using Xercesc_2_8 C++. I provide a XML file (containing an invalid
> tag) to the
> DOMBuilder parser. I then edit the DOM document which is generated and
> save the document back to the XML file. The content of this file is now
> truncated from the invalid tag onwards. Why does the parser modify the file
> while parsing? How do I prevent the same? i.e., I want the parser to report
> the error and continue parsing but not modify the XML content.
> Following is the snapshot of the XML file:-
> ...
> ...
> <Header id="My Project Id" nameStructure="DevName" revision="0"
> version="1">
>      ...
> </Header>
>      ...
>      ...
> <Services>
>      ...
>      ...
> </Services>
> <!-- Invalid tag: No node name -->
> <name="abc">
> ...
> ...
>  Following is the code snippet of the parser:-
> *void CHelper::InitDOM()
> *{
>         // m_pDomImpl is a pointer to DOMImplementation
>         m_pDomImpl = 0;
>         if(m_pDomImpl == NULL)
>         {
>               XMLPlatformUtils::Initialize();
>               m_pDomImpl =
> DOMImplementationRegistry::getDOMImplementation( gLS );
>          }
> }
>
> *int CHelper::LoadFile(DOMBuilder** pParser, const CString& strXMLFile,
> DOMDocument** pDoc, CStringArray&     arrError, bool bValidate, const
> CString& strSchemaFile)
> *{
>        ...
>        if(*pParser == NULL)
>        {
>               *pParser =
> ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
>                                                                                                  (DOMImplementationLS::MODE_SYNCHRONOUS,
>  0 );
>                if((*pParser) ==NULL)
>               {
>                     return DOM_INITIALIZE_FAILED;
>               }
>
>               (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
>               (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>               (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
> true );
>               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>               (*pParser)->setFeature(
> XMLUni::fgXercesCacheGrammarFromParse, true);
>        }
>
>        try
>        {
>               CMyDOMErrHandler eh();
>               m_arrValidationErrs.RemoveAll();
>
>               // parseURI a blocking call. All the errors will be reported
> first if any error handler is set
>               // then only the next line will be executed.
>               if(bValidate == true)
>              {
>                    (*pParser)->setErrorHandler(&eh);
>                    (*pParser)->loadGrammar( strSchemaFile,
> Grammar::SchemaGrammarType, true);
>              }
>              else
>              {
>                     (*pParser)->setErrorHandler(NULL);
>              }
>              *pDoc =(*pParser)->parseURI(strXMLFile);
>              ...
>              ...
>       }
>       catch(...)
>       {
>             ...
>       }
>
>       return SUCCESS;
>
> }
>
> Thank you in advance.
> Regards,
> Neetha
>
>
>
>

Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

Posted by Alberto Massari <Al...@progress.com>.

Hi,
Xerces doesn't modify your document; you should check the error handler 
to see if the parsing was aborted because of an error. In this case the 
returned DOM tree would be complete up to position of the error.

Alberto

Il 16/07/2012 10:25, neetha patil ha scritto:
> Dear All,
>
> I am using Xercesc_2_8 C++. I provide a XML file (containing an 
> invalid tag) to the
>
> DOMBuilderparser. I then edit the DOM document which is generated and 
> save the document back to the XML file. The content of this file is 
> now truncated from the invalid tag onwards. Why does the parser modify 
> the file while parsing? How do I prevent the same? i.e., I want the 
> parser to report the error and continue parsing but not modify the XML 
> content.
> Following is the snapshot of the XML file:-
> ...
> ...
> <Header id="My Project Id" nameStructure="DevName" revision="0" 
> version="1">
>      ...
> </Header>
>      ...
>      ...
> <Services>
>      ...
>      ...
> </Services>
> <!-- Invalid tag: No node name -->
> <name="abc">
> ...
> ...
> 　
> Following is the code snippet of the parser:-
> *void CHelper::InitDOM()
> *{
>         // m_pDomImpl is a pointer to DOMImplementation
>         m_pDomImpl = 0;
>         if(m_pDomImpl == NULL)
>         {
>               XMLPlatformUtils::Initialize();
>               m_pDomImpl = 
> DOMImplementationRegistry::getDOMImplementation( gLS );
>          }
> }
> *int CHelper::LoadFile(DOMBuilder** pParser, const CString& 
> strXMLFile, DOMDocument** pDoc, CStringArray& arrError, bool 
> bValidate, const CString& strSchemaFile)
> *{
>        ...
>        if(*pParser == NULL)
>        {
>               *pParser = 
> ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
>                                                                                                  (DOMImplementationLS::MODE_SYNCHRONOUS, 
>  0 );
>                if((*pParser) ==NULL)
>               {
>                     return DOM_INITIALIZE_FAILED;
>               }
>
>               (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
>               (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>               (*pParser)->setFeature( 
> XMLUni::fgXercesSchemaFullChecking, true );
>               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>               (*pParser)->setFeature( 
> XMLUni::fgXercesCacheGrammarFromParse, true);
>        }
>
>        try
>        {
>               CMyDOMErrHandler eh();
>               m_arrValidationErrs.RemoveAll();
>
>               // parseURI a blocking call. All the errors will be 
> reported first if any error handler is set
>               // then only the next line will be executed.
>               if(bValidate == true)
>              {
>                    (*pParser)->setErrorHandler(&eh);
>                    (*pParser)->loadGrammar( strSchemaFile, 
> Grammar::SchemaGrammarType, true);
>              }
>              else
>              {
>                     (*pParser)->setErrorHandler(NULL);
>              }
>              *pDoc =(*pParser)->parseURI(strXMLFile);
>              ...
>              ...
>       }
>       catch(...)
>       {
>             ...
>       }
>
>       return SUCCESS;
>
> }
>
> Thank you in advance.
>
> Regards,
> Neetha
>

Fwd: Why does Xerces modify an invalid XML file while parsing?

Posted by neetha patil <ne...@gmail.com>.

 Dear All,

I am using Xercesc_2_8 C++. I provide a XML file (containing an invalid
tag) to the DOMBuilder parser. I then edit the DOM document which is
generated and save the document back to the XML file. The content of this
file is now truncated from the invalid tag onwards. Why does the parser
modify the file while parsing? How do I prevent the same? i.e., I want the
parser to report the error and continue parsing but not modify the XML
content.
Following is the snapshot of the XML file:-
...
...
<Header id="My Project Id" nameStructure="DevName" revision="0" version="1">
     ...
</Header>
     ...
     ...
<Services>
     ...
     ...
</Services>
<!-- Invalid tag: No node name -->
<name="abc">
...
...
 Following is the code snippet of the parser:-
*void CHelper::InitDOM()
*{
        // m_pDomImpl is a pointer to DOMImplementation
        m_pDomImpl = 0;
        if(m_pDomImpl == NULL)
        {
              XMLPlatformUtils::Initialize();
              m_pDomImpl = DOMImplementationRegistry::getDOMImplementation(
gLS );
         }
}

*int CHelper::LoadFile(DOMBuilder** pParser, const CString& strXMLFile,
DOMDocument** pDoc, CStringArray&     arrError, bool bValidate, const
CString& strSchemaFile)
*{
       ...
       if(*pParser == NULL)
       {
              *pParser =
((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder

                          (DOMImplementationLS::MODE_SYNCHRONOUS,
 0 );
               if((*pParser) ==NULL)
              {
                    return DOM_INITIALIZE_FAILED;
              }

              (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
              (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
              (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
true );
              (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
              (*pParser)->setFeature(
XMLUni::fgXercesCacheGrammarFromParse, true);
       }

       try
       {
              CMyDOMErrHandler eh();
              m_arrValidationErrs.RemoveAll();

              // parseURI a blocking call. All the errors will be reported
first if any error handler is set
              // then only the next line will be executed.
              if(bValidate == true)
             {
                   (*pParser)->setErrorHandler(&eh);
                   (*pParser)->loadGrammar( strSchemaFile,
Grammar::SchemaGrammarType, true);
             }
             else
             {
                    (*pParser)->setErrorHandler(NULL);
             }
             *pDoc =(*pParser)->parseURI(strXMLFile);
             ...
             ...
      }
      catch(...)
      {
            ...
      }

      return SUCCESS;

}

Thank you in advance.
Regards,
Neetha