You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Bagepalli, Kiran" <kb...@informatica.com> on 2003/06/17 00:06:42 UTC

Continue on error

Is there anyway I could get the context of an error if it happens. Currently I get the line/column no where the error happened. I have a case where there are invalid characters(for that encoding) in a file. I would like to skip and continue parsing the file. 
  My question is how effective would be continuing the scan. Is the parser written to recover from a previous failure.
For eg. <TEST> BAD-DATA </TEST>. Is there anyway I could ignore the invalid characters for TEST tag while parsing. If I do would the scanner be able to recover from this failure and understand </TEST> is the end tag.
 My need is if there is a way to skip bad character content and continue parsing.

Thanks
Kiran

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Continue on error

Posted by da...@us.ibm.com.



Hi Kiran,

Since transcoding happens _before_ scanning, I'm not sure how the parser
would recover/respond.  It might throw away an entire buffer's worth of
incoming data, which would mean it could start scanning again many K bytes
into the stream.  Or, it might blowup on some invalid UTF-16 characters.
Or it might blow up because it gets horribly confused by the bogus data
it's scanning.

In short, I'm not sure how it would respond, and the behavior might vary
from release to release.  Why don't you transcode the document from the
incoming encoding to UTF-16 yourself, and feed the transcoded data to the
parser?  That way, you can detect invalid characters and recover in some
meaningful way, like substituting a replacement character for any bogus
characters.  It also should not be too much more inefficient than letting
the parser do the transcoding.

Dave



|---------+---------------------------->
|         |           "Bagepalli,      |
|         |           Kiran"           |
|         |           <kbagepalli@infor|
|         |           matica.com>      |
|         |                            |
|         |           06/16/2003 03:06 |
|         |           PM               |
|         |           Please respond to|
|         |           xerces-c-dev     |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                 |
  |        To:      <xe...@xml.apache.org>                                                                   |
  |        cc:      (bcc: David N Bertoni/Cambridge/IBM)                                                            |
  |        Subject: Continue on error                                                                               |
  >-----------------------------------------------------------------------------------------------------------------|



Is there anyway I could get the context of an error if it happens.
Currently I get the line/column no where the error happened. I have a case
where there are invalid characters(for that encoding) in a file. I would
like to skip and continue parsing the file.
  My question is how effective would be continuing the scan. Is the parser
written to recover from a previous failure.
For eg. <TEST> BAD-DATA </TEST>. Is there anyway I could ignore the invalid
characters for TEST tag while parsing. If I do would the scanner be able to
recover from this failure and understand </TEST> is the end tag.
 My need is if there is a way to skip bad character content and continue
parsing.

Thanks
Kiran

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org