You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by sushil kumar <su...@gmail.com> on 2007/05/30 11:10:18 UTC
are surrogate chars allowed in node names?
I am not able to parse following xml using xerces 2.7. Can anybody tell me
that surrogate chars are allowed in XML tag names or not?
<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<booklist>
<book>
<टाईटल>𪘚 Title</टाईटल>
<author>Amit 𪘚</author>
<author>𪘚.jpg</author>
<author>Kumar 𪘚</author>
<𪘚>Pearson Press</𪘚>
</book>
<book>
<टाईटल>𪘚.jpg</टाईटल>
<author>Charu 𪘚𪘚𪘚𪘚𪘚</author>
<author>Pankaj 𪘚𪘚𪘚𪘚</author>
<!--<𪘚>Pearson 𪘚𪘚𪘚𪘚 Press</𪘚>-->
</book>
</booklist>
Thanks,
Sushil
Re: are surrogate chars allowed in node names?
Posted by David Bertoni <db...@apache.org>.
sushil kumar wrote:
> following is the error message I am getting
>
> F:\Xerces-Binary\xerces-c-windows_2000-msvc_60\bin>DOMCount
> F:\surrogateelem_new
> .xml
>
> Fatal Error at file F:\surrogateelem_new.xml, line 8, char 4
> Message: Expected an element name
>
> Errors occurred, no output available
Please reply to the list, and not to my email address.
>
>
> F:\Xerces-Binary\xerces-c-windows_2000-msvc_60\bin>
>
> XML is well formed and even validate in XML-SPY application if you add
> following DTD on top of below mentioned XML
Are you sure the XML is well-formed, and that XML Spy is correct?
>
> <?xml version="1.0" encoding="UTF-16" standalone="yes"?>
> <!DOCTYPE booklist [
> <!ELEMENT booklist (book*)>
> <!ELEMENT book (टाईटल, author+, 𪘚)>
> <!ELEMENT टाईटल (#PCDATA)>
> <!ELEMENT author (#PCDATA)>
> <!ELEMENT 𪘚 (#PCDATA)>
> ]>
> <booklist>
> <book>
> <टाईटल>𪘚 Title</टाईटल>
> <author>Amit 𪘚</author>
> <author>𪘚.jpg</author>
> <author>Kumar 𪘚</author>
> <𪘚>टाईटल.jpg</𪘚>
> </book>
> <book>
> <टाईटल>𪘚.jpg</टाईटल>
> <author>Charu 𪘚𪘚𪘚𪘚𪘚</author>
> <author>Pankaj 𪘚𪘚𪘚𪘚</author>
> <𪘚>Pearson 𪘚𪘚𪘚𪘚 Press</𪘚>
> </book>
> </booklist>
>
> I just want to know whether Xerces support surrogate chars in tag name
> or not. As I have seen that XML-SPY is validating above mentioned XML.
Well, I pointed you to the XML recommendation, which describes the grammar
for element names, and includes a list of the Unicode code points allowed.
If you look at that, you'll see that there are no Unicode code points
encoded in UTF-16 as surrogate pairs that are allowed as name characters in
XML 1.0.
Since I don't have your original document (pasting it into an email message
doesn't work), I can't say for sure what the actual bytes are. However, I
did copy-and-paste the text from your reply, and was able to reproduce the
error message (when I saved the document as UTF-8 and updated the XML
declaration). In addition, three other XML parsers also reported that the
tag name in the DTD on line 4 is not a valid XML name.
Perhaps you should contact Altova and ask them why their XML parser does
not reject documents that are not well-formed.
Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org
Re: are surrogate chars allowed in node names?
Posted by David Bertoni <db...@apache.org>.
sushil kumar wrote:
> I am not able to parse following xml using xerces 2.7. Can anybody tell
> me that surrogate chars are allowed in XML tag names or not?
The grammar for XML 1.0 names is here:
http://www.w3.org/TR/REC-xml/#NT-Name
XML 1.1 is here:
http://www.w3.org/TR/2004/REC-xml11-20040204/#NT-Name
You didn't specify the error message you got from Xerces-C, but I suspect
the parser is correct, and your document is not well-formed.
Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org