You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Jacob Kjome <ho...@visi.com> on 2006/12/08 06:20:43 UTC

DTD and namespaces question

I'm having Xerces-2.9.0 parse a file with the XHTML Basic 1.0 doctype...

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
     "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
<head>
     <title>XHTML 1.1 Template</title>
     <meta http-equiv="Content-Type" content="application/xhtml+xml; 
charset=UTF-8" />
     <meta http-equiv="Content-Script-Type" 
content="application/x-javascript" />
     <meta http-equiv="Content-Style-Type" content="text/css" />
   </head>
   <body>
     <h1 id="headlineOne">Page 01</h1>
     <h1 class="RemoveMe">Remove Me</h1>
     <h1 class="RemoveMe">Remove Me</h1>
     <h1 class="RemoveMe">Remove Me</h1>
     <div>
     <span id="hello">Hello World</span>
     <ol id="table">
       <li>Hello</li>
     </ol>
     <a href="http://foo.com">change to bar.com</a>
     </div>
   </body>
</html>

But Xerces gives me the following error in parsing...

      [xmlc] 
D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26: 
Error: A colon is not allowed in the name 'IS10744:arch' when 
namespaces are enabled.
      [xmlc] Error: Parse of 
"D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml" 
failed: org.xml.sax.SAXParseException: A colon is not allowed in the 
name 'IS10744:arch' when namespaces are enabled.

Here's the part of the DTD that it appears to be bombing on (part of 
the flat version of the DTD [1], referenced using a catalog)...

<?IS10744:arch xhtml
     public-id       =  "-//W3C//NOTATION AFDR ARCBASE XHTML 1.1//EN"
     dtd-public-id   =  "-//W3C//DTD XHTML 1.1//EN"
     dtd-system-id   =  "xhtml11.dtd"
     doc-elem-form   =  "html"
     form-att        =  "html"
     renamer-att     =  "htnames"
     suppressor-att  =  "htsupp"
     data-ignore-att =  "htign"
     auto            =  "ArcAuto"
     options         =  "HtModReq HtModOpt"
     HtModReq        =  "Framework Text Hypertext Lists Structure"
     HtModOpt        =  "Standard"
?>

The w3c wrote this, not me.  Is Xerces correct in telling me that the 
W3C made a mistake in the DTD or is Xerces getting something 
wrong?  Or do I need to change my parser settings for this to 
work?  All the XHTML 1.0 DTDs work fine.  The XHTML 1.1 DTD works 
fine.  Why does it fail with this one?  I have a couple other 
problems with other DTDs such as the xhtml+voice12.dtd [2] and it 
seems to be namespace related as well, though the DTD is parsed just 
fine.  It's the namespaces in the document that it has a problem 
with, but I won't go into that one before hearing some opinions on 
this XHTML Basic 1.0 DTD problem first.

thanks,

Jake


[1] 
http://validator.w3.org/sgml-lib/REC-xhtml-basic-20001219/xhtml-basic10-f.dtd
[2] http://www.voicexml.org/specs/multimodal/x+v/12/dtd/xhtml+voice12.dtd


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DTD and namespaces question

Posted by Jacob Kjome <ho...@visi.com>.
Thanks Michael.  I updated my local copy to ignore the Architecture 
module, just like the XHTML 1.1. DTD does.  Seems like they should 
update their errata as well as fix this thing after all these years!

Jake

At 06:59 PM 12/9/2006, you wrote:
 >Hi Jake,
 >
 >The issue with this processing instruction was reported to the W3C
 >several years ago. Perhaps one of these threads [1][2] contains something
 >you'll find helpful.
 >
 >[1]
 >http://lists.w3.org/Archives/Public/www-html-editor/2001OctDec/1240.html
 >[2] http://lists.w3.org/Archives/Public/www-html/2002Feb/0086.html
 >
 >On Sat, 9 Dec 2006, Jacob Kjome wrote:
 >
 >> At 03:59 AM 12/8/2006, you wrote:
 >>> /Jacob Kjome/:
 >>>
 >>>> But Xerces gives me the following error in parsing...
 >>>>
 >>>>      [xmlc]
 >>>> D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26:
 >>>> Error: A colon is not allowed in the name 'IS10744:arch' when namespaces
 >>>> are enabled.
 >>>>      [xmlc] Error: Parse of
 >>>> "D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml"
 >>>> failed: org.xml.sax.SAXParseException: A colon is not allowed in the
 >>>> name 'IS10744:arch' when namespaces are enabled.
 >>>>
 >>>> Here's the part of the DTD that it appears to be bombing on (part of the
 >>>> flat version of the DTD [1], referenced using a catalog)...
 >>>>
 >>>> <?IS10744:arch xhtml
 >>> [...]
 >>>> ?>
 >>>>
 >>>> The w3c wrote this, not me.  Is Xerces correct in telling me that the
 >>>> W3C made a mistake in the DTD or is Xerces getting something wrong?
 >>>
 >>> As far as I know <http://www.w3.org/TR/xml-names/#dt-nwf>:
 >>>
 >>>> in a namespace-well-formed document:
 >>>>
 >>>>     * No entity names, processing instruction targets, or notation
 >>>> names contain any colons.
 >>>
 >>
 >> Hmm... thanks for the pointer.  But what has me puzzled is how they can
 >> possibly have a recommendation out there with an invalid DTD?  There's no
 >> known errors according to their errata statement [1], and they've
 >had about 6
 >> years to find them.  BTW, http://validator.w3.org/ says my document
 >validates
 >> just fine against the XHTML Basic 1.0 DTD.  Are they not checking against
 >> namespaces?
 >>
 >> It's a bit worrisome because the xhtml-arch-1.mod [2] is a standard module
 >> for XHTML modularization.  So, any DTD extending the various XHTML modules
 >> that include the arch module will fail under Xerces2.  It's hard 
to believe
 >> that the W3C members were boneheaded enough to write an invalid 
module that
 >> every other extension of XHTML modularization depends on.  The 
XHTML 1.1 DTD
 >> [3] defines ignores the arch module...
 >>
 >> <!ENTITY % xhtml-arch.module "IGNORE" >
 >> <![%xhtml-arch.module;[
 >> <!ENTITY % xhtml-arch.mod
 >>     PUBLIC "-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN"
 >>            "xhtml-arch-1.mod" >
 >> %xhtml-arch.mod;]]>
 >>
 >> That's, apparently, the only reason why it succeeds under Xerces2.
 >>
 >>> So Xerces is correct.  You could use no namespace processing if you
 >>> don't necessary need it.
 >>>
 >>
 >> Based upon your reference, it appears that Xerces2 is correct, 
but it's got
 >> to be more nuanced than Xerces2 is right and the XHTML modularization spec
 >> leaders are wrong.  And turning off namespaces would entirely defeat the
 >> purpose of XHTML modularization, no?  There's got to be a better answer.
 >> Anyone?
 >>
 >> Jake
 >>
 >>
 >> [1] http://www.w3.org/2000/12/REC-xhtml-basic-20001219-errata
 >> [2]
 >> http://validator.w3.org/sgml-lib/REC-xhtml-basic-20001219/xhtml-arch-1.mod
 >> [3] http://validator.w3.org/sgml-lib/REC-xhtml11-20010531/xhtml11-flat.dtd
 >>
 >>> --
 >>> Stanimir
 >>
 >>
 >> ---------------------------------------------------------------------
 >> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
 >> For additional commands, e-mail: j-users-help@xerces.apache.org
 >
 >---------------------------
 >Michael Glavassevich
 >XML Parser Development
 >IBM Toronto Lab
 >E-mail: mrglavas@ca.ibm.com
 >E-mail: mrglavas@apache.org
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
 >For additional commands, e-mail: j-users-help@xerces.apache.org
 >
 >
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DTD and namespaces question

Posted by Michael Glavassevich <mr...@apache.org>.
Hi Jake,

The issue with this processing instruction was reported to the W3C 
several years ago. Perhaps one of these threads [1][2] contains something 
you'll find helpful.

[1] 
http://lists.w3.org/Archives/Public/www-html-editor/2001OctDec/1240.html
[2] http://lists.w3.org/Archives/Public/www-html/2002Feb/0086.html

On Sat, 9 Dec 2006, Jacob Kjome wrote:

> At 03:59 AM 12/8/2006, you wrote:
>> /Jacob Kjome/:
>>
>>> But Xerces gives me the following error in parsing...
>>>
>>>      [xmlc]
>>> D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26:
>>> Error: A colon is not allowed in the name 'IS10744:arch' when namespaces
>>> are enabled.
>>>      [xmlc] Error: Parse of
>>> "D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml"
>>> failed: org.xml.sax.SAXParseException: A colon is not allowed in the
>>> name 'IS10744:arch' when namespaces are enabled.
>>>
>>> Here's the part of the DTD that it appears to be bombing on (part of the
>>> flat version of the DTD [1], referenced using a catalog)...
>>>
>>> <?IS10744:arch xhtml
>> [...]
>>> ?>
>>>
>>> The w3c wrote this, not me.  Is Xerces correct in telling me that the
>>> W3C made a mistake in the DTD or is Xerces getting something wrong?
>>
>> As far as I know <http://www.w3.org/TR/xml-names/#dt-nwf>:
>>
>>> in a namespace-well-formed document:
>>>
>>>     * No entity names, processing instruction targets, or notation
>>> names contain any colons.
>>
>
> Hmm... thanks for the pointer.  But what has me puzzled is how they can 
> possibly have a recommendation out there with an invalid DTD?  There's no 
> known errors according to their errata statement [1], and they've had about 6 
> years to find them.  BTW, http://validator.w3.org/ says my document validates 
> just fine against the XHTML Basic 1.0 DTD.  Are they not checking against 
> namespaces?
>
> It's a bit worrisome because the xhtml-arch-1.mod [2] is a standard module 
> for XHTML modularization.  So, any DTD extending the various XHTML modules 
> that include the arch module will fail under Xerces2.  It's hard to believe 
> that the W3C members were boneheaded enough to write an invalid module that 
> every other extension of XHTML modularization depends on.  The XHTML 1.1 DTD 
> [3] defines ignores the arch module...
>
> <!ENTITY % xhtml-arch.module "IGNORE" >
> <![%xhtml-arch.module;[
> <!ENTITY % xhtml-arch.mod
>     PUBLIC "-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN"
>            "xhtml-arch-1.mod" >
> %xhtml-arch.mod;]]>
>
> That's, apparently, the only reason why it succeeds under Xerces2.
>
>> So Xerces is correct.  You could use no namespace processing if you
>> don't necessary need it.
>>
>
> Based upon your reference, it appears that Xerces2 is correct, but it's got 
> to be more nuanced than Xerces2 is right and the XHTML modularization spec 
> leaders are wrong.  And turning off namespaces would entirely defeat the 
> purpose of XHTML modularization, no?  There's got to be a better answer. 
> Anyone?
>
> Jake
>
>
> [1] http://www.w3.org/2000/12/REC-xhtml-basic-20001219-errata
> [2] 
> http://validator.w3.org/sgml-lib/REC-xhtml-basic-20001219/xhtml-arch-1.mod
> [3] http://validator.w3.org/sgml-lib/REC-xhtml11-20010531/xhtml11-flat.dtd
>
>> --
>> Stanimir
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DTD and namespaces question

Posted by Jacob Kjome <ho...@visi.com>.
At 12:40 PM 12/10/2006, you wrote:
 >Jacob Kjome wrote:
 >
 >> So, ISO architectural forms were an SGML thing brought over to XML?  But
 >> why bother if it's incompatible with XML namespaces?  And how is
 >> "Namespace malformed" not "invalid"?  If I have an XML document that's
 >> malformed, it's most certainly not going to be valid.
 >
 >It's not that simple. Well-formedness is a prerequisite for validity but
 >namespace well-formedness is not. Remember, namespaces postdate XML 1.0
 >by about a year. I suspect but don't know that the ISO architectural
 >form syntax may have been decided before namespaces were finalized.
 >

Ok, well, that's a reasonable answer.  Then 
again, the XHTML Basic 1.0 [1] spec came out 
after the Namespaces spec [2], and they must have 
seen drafts of the Namespaces spec before 
that.  I guess it is what it is now, but it's a 
pretty lame oversight, IMO.  I suppose parsers 
might not have been updated to conform to the 
Namespaces spec at that point, so I'll give them 
that much.  Their tests may have appeared to pass 
when they should have failed had their parser enforced the Namespaces spec.

Jake


[1] http://www.w3.org/TR/2000/REC-xhtml-basic-20001219/
[2] http://www.w3.org/TR/1999/REC-xml-names-19990114/#Conformance

 >--
 >Elliotte Rusty Harold  elharo@metalab.unc.edu
 >Java I/O 2nd Edition Just Published!
 >http://www.cafeaulait.org/books/javaio2/
 >http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
 >For additional commands, e-mail: j-users-help@xerces.apache.org
 >
 >
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DTD and namespaces question

Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jacob Kjome wrote:

> So, ISO architectural forms were an SGML thing brought over to XML?  But 
> why bother if it's incompatible with XML namespaces?  And how is 
> "Namespace malformed" not "invalid"?  If I have an XML document that's 
> malformed, it's most certainly not going to be valid. 

It's not that simple. Well-formedness is a prerequisite for validity but 
namespace well-formedness is not. Remember, namespaces postdate XML 1.0 
by about a year. I suspect but don't know that the ISO architectural 
form syntax may have been decided before namespaces were finalized.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DTD and namespaces question

Posted by Jacob Kjome <ho...@visi.com>.
At 04:43 AM 12/10/2006, you wrote:
 >Jacob Kjome wrote:
 >
 >> Hmm... thanks for the pointer.  But what has me puzzled is how they can
 >> possibly have a recommendation out there with an invalid DTD?
 >
 >It's not invalid. It's namespace malformed. This is a known and
 >longstanding problem with ISO architectural forms and namespaces. They
 >really aren't compatible.
 >

So, ISO architectural forms were an SGML thing 
brought over to XML?  But why bother if it's 
incompatible with XML namespaces?  And how is 
"Namespace malformed" not "invalid"?  If I have 
an XML document that's malformed, it's most 
certainly not going to be valid.  I suppose an 
XML parser would never get to the point of 
"validation" if it can't parse it in the first 
place, but the distinction seems a little 
pedantic in this case.  Then again, this is 
coming from an XML guru, so what should I expect 
(that's a compliment, BTW)?  I just fail to 
understand how they spent all this time coming up 
with the XHTML Basic 1.0 DTD and then didn't 
think to test the thing!  Xerces fails to parse 
it.  They either didn't test it or their test parser is lenient to a fault, no?

Jake

 >--
 >Elliotte Rusty Harold  elharo@metalab.unc.edu
 >Java I/O 2nd Edition Just Published!
 >http://www.cafeaulait.org/books/javaio2/
 >http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
 >For additional commands, e-mail: j-users-help@xerces.apache.org
 >
 >
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DTD and namespaces question

Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jacob Kjome wrote:

> Hmm... thanks for the pointer.  But what has me puzzled is how they can 
> possibly have a recommendation out there with an invalid DTD? 

It's not invalid. It's namespace malformed. This is a known and 
longstanding problem with ISO architectural forms and namespaces. They 
really aren't compatible.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DTD and namespaces question

Posted by Jacob Kjome <ho...@visi.com>.
At 03:59 AM 12/8/2006, you wrote:
 >/Jacob Kjome/:
 >
 >> But Xerces gives me the following error in parsing...
 >>
 >>      [xmlc]
 >> D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26:
 >> Error: A colon is not allowed in the name 'IS10744:arch' when namespaces
 >> are enabled.
 >>      [xmlc] Error: Parse of
 >> "D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml"
 >> failed: org.xml.sax.SAXParseException: A colon is not allowed in the
 >> name 'IS10744:arch' when namespaces are enabled.
 >>
 >> Here's the part of the DTD that it appears to be bombing on (part of the
 >> flat version of the DTD [1], referenced using a catalog)...
 >>
 >> <?IS10744:arch xhtml
 >[...]
 >> ?>
 >>
 >> The w3c wrote this, not me.  Is Xerces correct in telling me that the
 >> W3C made a mistake in the DTD or is Xerces getting something wrong?
 >
 >As far as I know <http://www.w3.org/TR/xml-names/#dt-nwf>:
 >
 >> in a namespace-well-formed document:
 >>
 >>     * No entity names, processing instruction targets, or notation
 >> names contain any colons.
 >

Hmm... thanks for the pointer.  But what has me puzzled is how they 
can possibly have a recommendation out there with an invalid 
DTD?  There's no known errors according to their errata statement 
[1], and they've had about 6 years to find them.  BTW, 
http://validator.w3.org/ says my document validates just fine against 
the XHTML Basic 1.0 DTD.  Are they not checking against namespaces?

It's a bit worrisome because the xhtml-arch-1.mod [2] is a standard 
module for XHTML modularization.  So, any DTD extending the various 
XHTML modules that include the arch module will fail under 
Xerces2.  It's hard to believe that the W3C members were boneheaded 
enough to write an invalid module that every other extension of XHTML 
modularization depends on.  The XHTML 1.1 DTD [3] defines ignores the 
arch module...

<!ENTITY % xhtml-arch.module "IGNORE" >
<![%xhtml-arch.module;[
<!ENTITY % xhtml-arch.mod
      PUBLIC "-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN"
             "xhtml-arch-1.mod" >
%xhtml-arch.mod;]]>

That's, apparently, the only reason why it succeeds under Xerces2.

 >So Xerces is correct.  You could use no namespace processing if you
 >don't necessary need it.
 >

Based upon your reference, it appears that Xerces2 is correct, but 
it's got to be more nuanced than Xerces2 is right and the XHTML 
modularization spec leaders are wrong.  And turning off namespaces 
would entirely defeat the purpose of XHTML modularization, 
no?  There's got to be a better answer.  Anyone?

Jake


[1] http://www.w3.org/2000/12/REC-xhtml-basic-20001219-errata
[2] http://validator.w3.org/sgml-lib/REC-xhtml-basic-20001219/xhtml-arch-1.mod
[3] http://validator.w3.org/sgml-lib/REC-xhtml11-20010531/xhtml11-flat.dtd

 >--
 >Stanimir


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: DTD and namespaces question

Posted by Stanimir Stamenkov <st...@myrealbox.com>.
/Jacob Kjome/:

> But Xerces gives me the following error in parsing...
> 
>      [xmlc] 
> D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26: 
> Error: A colon is not allowed in the name 'IS10744:arch' when namespaces 
> are enabled.
>      [xmlc] Error: Parse of 
> "D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml" 
> failed: org.xml.sax.SAXParseException: A colon is not allowed in the 
> name 'IS10744:arch' when namespaces are enabled.
> 
> Here's the part of the DTD that it appears to be bombing on (part of the 
> flat version of the DTD [1], referenced using a catalog)...
> 
> <?IS10744:arch xhtml
[...]
> ?>
> 
> The w3c wrote this, not me.  Is Xerces correct in telling me that the 
> W3C made a mistake in the DTD or is Xerces getting something wrong?

As far as I know <http://www.w3.org/TR/xml-names/#dt-nwf>:

> in a namespace-well-formed document:
> 
>     * No entity names, processing instruction targets, or notation 
> names contain any colons.

So Xerces is correct.  You could use no namespace processing if you 
don't necessary need it.

-- 
Stanimir

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org