You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by sushil kumar <su...@gmail.com> on 2007/05/30 11:10:18 UTC

are surrogate chars allowed in node names?

I am not able to parse following xml using xerces 2.7. Can anybody tell me
that surrogate chars are allowed in XML tag names or not?

<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<booklist>
    <book>
        <टाईटल>𪘚 Title</टाईटल>
        <author>Amit 𪘚</author>
        <author>𪘚.jpg</author>
        <author>Kumar 𪘚</author>
        <𪘚>Pearson Press</𪘚>
    </book>
    <book>
        <टाईटल>𪘚.jpg</टाईटल>
        <author>Charu 𪘚𪘚𪘚𪘚𪘚</author>
        <author>Pankaj 𪘚𪘚𪘚𪘚</author>
        <!--<𪘚>Pearson 𪘚𪘚𪘚𪘚 Press</𪘚>-->
    </book>
</booklist>


Thanks,
Sushil

Re: are surrogate chars allowed in node names?

Posted by David Bertoni <db...@apache.org>.
sushil kumar wrote:
> following is the error message I am getting
> 
> F:\Xerces-Binary\xerces-c-windows_2000-msvc_60\bin>DOMCount 
> F:\surrogateelem_new
> .xml
> 
> Fatal Error at file F:\surrogateelem_new.xml, line 8, char 4
>   Message: Expected an element name
> 
> Errors occurred, no output available

Please reply to the list, and not to my email address.

> 
> 
> F:\Xerces-Binary\xerces-c-windows_2000-msvc_60\bin>
> 
> XML is well formed and even validate in XML-SPY application if you add 
> following DTD on top of below mentioned XML

Are you sure the XML is well-formed, and that XML Spy is correct?

> 
> <?xml version="1.0" encoding="UTF-16" standalone="yes"?>
> <!DOCTYPE booklist [
>     <!ELEMENT booklist (book*)>
>     <!ELEMENT book (टाईटल, author+, 𪘚)>
>     <!ELEMENT टाईटल (#PCDATA)>
>     <!ELEMENT author (#PCDATA)>
>     <!ELEMENT 𪘚 (#PCDATA)>
> ]>
> <booklist>
>     <book>
>         <टाईटल>𪘚 Title</टाईटल>
>         <author>Amit 𪘚</author>
>         <author>𪘚.jpg</author>
>         <author>Kumar 𪘚</author>
>         <𪘚>टाईटल.jpg</𪘚>
>     </book>
>     <book>
>         <टाईटल>𪘚.jpg</टाईटल>
>         <author>Charu 𪘚𪘚𪘚𪘚𪘚</author>
>         <author>Pankaj 𪘚𪘚𪘚𪘚</author>
>         <𪘚>Pearson 𪘚𪘚𪘚𪘚 Press</𪘚>
>     </book>
> </booklist>
> 
> I just want to know whether Xerces support surrogate chars in tag name 
> or not. As I have seen that XML-SPY is validating above mentioned XML.

Well, I pointed you to the XML recommendation, which describes the grammar 
for element names, and includes a list of the Unicode code points allowed. 
  If you look at that, you'll see that there are no Unicode code points 
encoded in UTF-16 as surrogate pairs that are allowed as name characters in 
XML 1.0.

Since I don't have your original document (pasting it into an email message 
doesn't work), I can't say for sure what the actual bytes are.  However, I 
did copy-and-paste the text from your reply, and was able to reproduce the 
error message (when I saved the document as UTF-8 and updated the XML 
declaration).  In addition, three other XML parsers also reported that the 
tag name in the DTD on line 4 is not a valid XML name.

Perhaps you should contact Altova and ask them why their XML parser does 
not reject documents that are not well-formed.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: are surrogate chars allowed in node names?

Posted by David Bertoni <db...@apache.org>.
sushil kumar wrote:
> I am not able to parse following xml using xerces 2.7. Can anybody tell 
> me that surrogate chars are allowed in XML tag names or not?
The grammar for XML 1.0 names is here:

http://www.w3.org/TR/REC-xml/#NT-Name

XML 1.1 is here:

http://www.w3.org/TR/2004/REC-xml11-20040204/#NT-Name

You didn't specify the error message you got from Xerces-C, but I suspect 
the parser is correct, and your document is not well-formed.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org