You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org> on 2005/03/03 05:28:49 UTC
[jira] Commented: (XERCESC-1361) CRLF is translated to LF in scanCharData
[ http://issues.apache.org/jira/browse/XERCESC-1361?page=comments#action_60089 ]
Michael Glavassevich commented on XERCESC-1361:
-----------------------------------------------
What you're describing is end of line handling [1]. This behaviour is expected and required by the spec. XML parsers must translate CR LF to LF (and also CR to LF).
[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-line-ends
> CRLF is translated to LF in scanCharData
> ----------------------------------------
>
> Key: XERCESC-1361
> URL: http://issues.apache.org/jira/browse/XERCESC-1361
> Project: Xerces-C++
> Type: Bug
> Components: SAX/SAX2
> Versions: 2.6.0
> Environment: win2k, Xerces-c 2.6(build the src with vc6+sp5) and Xerces-c 2.1 binary version
> Reporter: ding hua
>
> When i parse a simple xml document. there is a CRLF between aaa and bbb. But saxparse call method characters, the string is translated to aaa LF bbb. It loses the char CR.
> <?xml version="1.0" encoding="gb2312" standalone="no"?>
> <dd><ddrow><text>aaa
> bbb</text>
> </ddrow></dd>
> And i trace the code, i find the char is eated up by handleEOL. I want keep the content unchanged. Is it reasonable? Thanks.
> The call stack
> xercesc_2_6::XMLReader::handleEOL(unsigned short & 0x000d, unsigned char 0x00) line 898
> xercesc_2_6::XMLReader::getNextCharIfNot(const unsigned short 0x003c, unsigned short & 0x000d) line 789
> xercesc_2_6::ReaderMgr::getNextCharIfNot(const unsigned short 0x003c, unsigned short & 0x000d) line 398
> xercesc_2_6::IGXMLScanner::scanCharData(xercesc_2_6::XMLBuffer & {...}) line 2630 + 17 bytes
> xercesc_2_6::IGXMLScanner::scanContent() line 837
> xercesc_2_6::IGXMLScanner::scanDocument(const xercesc_2_6::InputSource & {...}) line 204 + 8 bytes
> xercesc_2_6::SAXParser::parse(const xercesc_2_6::InputSource & {...}) line 720
> internal\XMLReader.hpp Ln895
> if ( fCharBuf[fCharIndex] == chLF ||
> ((fCharBuf[fCharIndex] == chNEL) && fNEL) )
> {
> fCharIndex++;
> }
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org