You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Jacob Kjome <ho...@visi.com> on 2006/12/08 06:20:43 UTC
DTD and namespaces question
I'm having Xerces-2.9.0 parse a file with the XHTML Basic 1.0 doctype...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
<head>
<title>XHTML 1.1 Template</title>
<meta http-equiv="Content-Type" content="application/xhtml+xml;
charset=UTF-8" />
<meta http-equiv="Content-Script-Type"
content="application/x-javascript" />
<meta http-equiv="Content-Style-Type" content="text/css" />
</head>
<body>
<h1 id="headlineOne">Page 01</h1>
<h1 class="RemoveMe">Remove Me</h1>
<h1 class="RemoveMe">Remove Me</h1>
<h1 class="RemoveMe">Remove Me</h1>
<div>
<span id="hello">Hello World</span>
<ol id="table">
<li>Hello</li>
</ol>
<a href="http://foo.com">change to bar.com</a>
</div>
</body>
</html>
But Xerces gives me the following error in parsing...
[xmlc]
D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26:
Error: A colon is not allowed in the name 'IS10744:arch' when
namespaces are enabled.
[xmlc] Error: Parse of
"D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml"
failed: org.xml.sax.SAXParseException: A colon is not allowed in the
name 'IS10744:arch' when namespaces are enabled.
Here's the part of the DTD that it appears to be bombing on (part of
the flat version of the DTD [1], referenced using a catalog)...
<?IS10744:arch xhtml
public-id = "-//W3C//NOTATION AFDR ARCBASE XHTML 1.1//EN"
dtd-public-id = "-//W3C//DTD XHTML 1.1//EN"
dtd-system-id = "xhtml11.dtd"
doc-elem-form = "html"
form-att = "html"
renamer-att = "htnames"
suppressor-att = "htsupp"
data-ignore-att = "htign"
auto = "ArcAuto"
options = "HtModReq HtModOpt"
HtModReq = "Framework Text Hypertext Lists Structure"
HtModOpt = "Standard"
?>
The w3c wrote this, not me. Is Xerces correct in telling me that the
W3C made a mistake in the DTD or is Xerces getting something
wrong? Or do I need to change my parser settings for this to
work? All the XHTML 1.0 DTDs work fine. The XHTML 1.1 DTD works
fine. Why does it fail with this one? I have a couple other
problems with other DTDs such as the xhtml+voice12.dtd [2] and it
seems to be namespace related as well, though the DTD is parsed just
fine. It's the namespaces in the document that it has a problem
with, but I won't go into that one before hearing some opinions on
this XHTML Basic 1.0 DTD problem first.
thanks,
Jake
[1]
http://validator.w3.org/sgml-lib/REC-xhtml-basic-20001219/xhtml-basic10-f.dtd
[2] http://www.voicexml.org/specs/multimodal/x+v/12/dtd/xhtml+voice12.dtd
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: DTD and namespaces question
Posted by Jacob Kjome <ho...@visi.com>.
Thanks Michael. I updated my local copy to ignore the Architecture
module, just like the XHTML 1.1. DTD does. Seems like they should
update their errata as well as fix this thing after all these years!
Jake
At 06:59 PM 12/9/2006, you wrote:
>Hi Jake,
>
>The issue with this processing instruction was reported to the W3C
>several years ago. Perhaps one of these threads [1][2] contains something
>you'll find helpful.
>
>[1]
>http://lists.w3.org/Archives/Public/www-html-editor/2001OctDec/1240.html
>[2] http://lists.w3.org/Archives/Public/www-html/2002Feb/0086.html
>
>On Sat, 9 Dec 2006, Jacob Kjome wrote:
>
>> At 03:59 AM 12/8/2006, you wrote:
>>> /Jacob Kjome/:
>>>
>>>> But Xerces gives me the following error in parsing...
>>>>
>>>> [xmlc]
>>>> D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26:
>>>> Error: A colon is not allowed in the name 'IS10744:arch' when namespaces
>>>> are enabled.
>>>> [xmlc] Error: Parse of
>>>> "D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml"
>>>> failed: org.xml.sax.SAXParseException: A colon is not allowed in the
>>>> name 'IS10744:arch' when namespaces are enabled.
>>>>
>>>> Here's the part of the DTD that it appears to be bombing on (part of the
>>>> flat version of the DTD [1], referenced using a catalog)...
>>>>
>>>> <?IS10744:arch xhtml
>>> [...]
>>>> ?>
>>>>
>>>> The w3c wrote this, not me. Is Xerces correct in telling me that the
>>>> W3C made a mistake in the DTD or is Xerces getting something wrong?
>>>
>>> As far as I know <http://www.w3.org/TR/xml-names/#dt-nwf>:
>>>
>>>> in a namespace-well-formed document:
>>>>
>>>> * No entity names, processing instruction targets, or notation
>>>> names contain any colons.
>>>
>>
>> Hmm... thanks for the pointer. But what has me puzzled is how they can
>> possibly have a recommendation out there with an invalid DTD? There's no
>> known errors according to their errata statement [1], and they've
>had about 6
>> years to find them. BTW, http://validator.w3.org/ says my document
>validates
>> just fine against the XHTML Basic 1.0 DTD. Are they not checking against
>> namespaces?
>>
>> It's a bit worrisome because the xhtml-arch-1.mod [2] is a standard module
>> for XHTML modularization. So, any DTD extending the various XHTML modules
>> that include the arch module will fail under Xerces2. It's hard
to believe
>> that the W3C members were boneheaded enough to write an invalid
module that
>> every other extension of XHTML modularization depends on. The
XHTML 1.1 DTD
>> [3] defines ignores the arch module...
>>
>> <!ENTITY % xhtml-arch.module "IGNORE" >
>> <![%xhtml-arch.module;[
>> <!ENTITY % xhtml-arch.mod
>> PUBLIC "-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN"
>> "xhtml-arch-1.mod" >
>> %xhtml-arch.mod;]]>
>>
>> That's, apparently, the only reason why it succeeds under Xerces2.
>>
>>> So Xerces is correct. You could use no namespace processing if you
>>> don't necessary need it.
>>>
>>
>> Based upon your reference, it appears that Xerces2 is correct,
but it's got
>> to be more nuanced than Xerces2 is right and the XHTML modularization spec
>> leaders are wrong. And turning off namespaces would entirely defeat the
>> purpose of XHTML modularization, no? There's got to be a better answer.
>> Anyone?
>>
>> Jake
>>
>>
>> [1] http://www.w3.org/2000/12/REC-xhtml-basic-20001219-errata
>> [2]
>> http://validator.w3.org/sgml-lib/REC-xhtml-basic-20001219/xhtml-arch-1.mod
>> [3] http://validator.w3.org/sgml-lib/REC-xhtml11-20010531/xhtml11-flat.dtd
>>
>>> --
>>> Stanimir
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>---------------------------
>Michael Glavassevich
>XML Parser Development
>IBM Toronto Lab
>E-mail: mrglavas@ca.ibm.com
>E-mail: mrglavas@apache.org
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: DTD and namespaces question
Posted by Michael Glavassevich <mr...@apache.org>.
Hi Jake,
The issue with this processing instruction was reported to the W3C
several years ago. Perhaps one of these threads [1][2] contains something
you'll find helpful.
[1]
http://lists.w3.org/Archives/Public/www-html-editor/2001OctDec/1240.html
[2] http://lists.w3.org/Archives/Public/www-html/2002Feb/0086.html
On Sat, 9 Dec 2006, Jacob Kjome wrote:
> At 03:59 AM 12/8/2006, you wrote:
>> /Jacob Kjome/:
>>
>>> But Xerces gives me the following error in parsing...
>>>
>>> [xmlc]
>>> D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26:
>>> Error: A colon is not allowed in the name 'IS10744:arch' when namespaces
>>> are enabled.
>>> [xmlc] Error: Parse of
>>> "D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml"
>>> failed: org.xml.sax.SAXParseException: A colon is not allowed in the
>>> name 'IS10744:arch' when namespaces are enabled.
>>>
>>> Here's the part of the DTD that it appears to be bombing on (part of the
>>> flat version of the DTD [1], referenced using a catalog)...
>>>
>>> <?IS10744:arch xhtml
>> [...]
>>> ?>
>>>
>>> The w3c wrote this, not me. Is Xerces correct in telling me that the
>>> W3C made a mistake in the DTD or is Xerces getting something wrong?
>>
>> As far as I know <http://www.w3.org/TR/xml-names/#dt-nwf>:
>>
>>> in a namespace-well-formed document:
>>>
>>> * No entity names, processing instruction targets, or notation
>>> names contain any colons.
>>
>
> Hmm... thanks for the pointer. But what has me puzzled is how they can
> possibly have a recommendation out there with an invalid DTD? There's no
> known errors according to their errata statement [1], and they've had about 6
> years to find them. BTW, http://validator.w3.org/ says my document validates
> just fine against the XHTML Basic 1.0 DTD. Are they not checking against
> namespaces?
>
> It's a bit worrisome because the xhtml-arch-1.mod [2] is a standard module
> for XHTML modularization. So, any DTD extending the various XHTML modules
> that include the arch module will fail under Xerces2. It's hard to believe
> that the W3C members were boneheaded enough to write an invalid module that
> every other extension of XHTML modularization depends on. The XHTML 1.1 DTD
> [3] defines ignores the arch module...
>
> <!ENTITY % xhtml-arch.module "IGNORE" >
> <![%xhtml-arch.module;[
> <!ENTITY % xhtml-arch.mod
> PUBLIC "-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN"
> "xhtml-arch-1.mod" >
> %xhtml-arch.mod;]]>
>
> That's, apparently, the only reason why it succeeds under Xerces2.
>
>> So Xerces is correct. You could use no namespace processing if you
>> don't necessary need it.
>>
>
> Based upon your reference, it appears that Xerces2 is correct, but it's got
> to be more nuanced than Xerces2 is right and the XHTML modularization spec
> leaders are wrong. And turning off namespaces would entirely defeat the
> purpose of XHTML modularization, no? There's got to be a better answer.
> Anyone?
>
> Jake
>
>
> [1] http://www.w3.org/2000/12/REC-xhtml-basic-20001219-errata
> [2]
> http://validator.w3.org/sgml-lib/REC-xhtml-basic-20001219/xhtml-arch-1.mod
> [3] http://validator.w3.org/sgml-lib/REC-xhtml11-20010531/xhtml11-flat.dtd
>
>> --
>> Stanimir
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: DTD and namespaces question
Posted by Jacob Kjome <ho...@visi.com>.
At 12:40 PM 12/10/2006, you wrote:
>Jacob Kjome wrote:
>
>> So, ISO architectural forms were an SGML thing brought over to XML? But
>> why bother if it's incompatible with XML namespaces? And how is
>> "Namespace malformed" not "invalid"? If I have an XML document that's
>> malformed, it's most certainly not going to be valid.
>
>It's not that simple. Well-formedness is a prerequisite for validity but
>namespace well-formedness is not. Remember, namespaces postdate XML 1.0
>by about a year. I suspect but don't know that the ISO architectural
>form syntax may have been decided before namespaces were finalized.
>
Ok, well, that's a reasonable answer. Then
again, the XHTML Basic 1.0 [1] spec came out
after the Namespaces spec [2], and they must have
seen drafts of the Namespaces spec before
that. I guess it is what it is now, but it's a
pretty lame oversight, IMO. I suppose parsers
might not have been updated to conform to the
Namespaces spec at that point, so I'll give them
that much. Their tests may have appeared to pass
when they should have failed had their parser enforced the Namespaces spec.
Jake
[1] http://www.w3.org/TR/2000/REC-xhtml-basic-20001219/
[2] http://www.w3.org/TR/1999/REC-xml-names-19990114/#Conformance
>--
>Elliotte Rusty Harold elharo@metalab.unc.edu
>Java I/O 2nd Edition Just Published!
>http://www.cafeaulait.org/books/javaio2/
>http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: DTD and namespaces question
Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jacob Kjome wrote:
> So, ISO architectural forms were an SGML thing brought over to XML? But
> why bother if it's incompatible with XML namespaces? And how is
> "Namespace malformed" not "invalid"? If I have an XML document that's
> malformed, it's most certainly not going to be valid.
It's not that simple. Well-formedness is a prerequisite for validity but
namespace well-formedness is not. Remember, namespaces postdate XML 1.0
by about a year. I suspect but don't know that the ISO architectural
form syntax may have been decided before namespaces were finalized.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: DTD and namespaces question
Posted by Jacob Kjome <ho...@visi.com>.
At 04:43 AM 12/10/2006, you wrote:
>Jacob Kjome wrote:
>
>> Hmm... thanks for the pointer. But what has me puzzled is how they can
>> possibly have a recommendation out there with an invalid DTD?
>
>It's not invalid. It's namespace malformed. This is a known and
>longstanding problem with ISO architectural forms and namespaces. They
>really aren't compatible.
>
So, ISO architectural forms were an SGML thing
brought over to XML? But why bother if it's
incompatible with XML namespaces? And how is
"Namespace malformed" not "invalid"? If I have
an XML document that's malformed, it's most
certainly not going to be valid. I suppose an
XML parser would never get to the point of
"validation" if it can't parse it in the first
place, but the distinction seems a little
pedantic in this case. Then again, this is
coming from an XML guru, so what should I expect
(that's a compliment, BTW)? I just fail to
understand how they spent all this time coming up
with the XHTML Basic 1.0 DTD and then didn't
think to test the thing! Xerces fails to parse
it. They either didn't test it or their test parser is lenient to a fault, no?
Jake
>--
>Elliotte Rusty Harold elharo@metalab.unc.edu
>Java I/O 2nd Edition Just Published!
>http://www.cafeaulait.org/books/javaio2/
>http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: DTD and namespaces question
Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jacob Kjome wrote:
> Hmm... thanks for the pointer. But what has me puzzled is how they can
> possibly have a recommendation out there with an invalid DTD?
It's not invalid. It's namespace malformed. This is a known and
longstanding problem with ISO architectural forms and namespaces. They
really aren't compatible.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: DTD and namespaces question
Posted by Jacob Kjome <ho...@visi.com>.
At 03:59 AM 12/8/2006, you wrote:
>/Jacob Kjome/:
>
>> But Xerces gives me the following error in parsing...
>>
>> [xmlc]
>> D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26:
>> Error: A colon is not allowed in the name 'IS10744:arch' when namespaces
>> are enabled.
>> [xmlc] Error: Parse of
>> "D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml"
>> failed: org.xml.sax.SAXParseException: A colon is not allowed in the
>> name 'IS10744:arch' when namespaces are enabled.
>>
>> Here's the part of the DTD that it appears to be bombing on (part of the
>> flat version of the DTD [1], referenced using a catalog)...
>>
>> <?IS10744:arch xhtml
>[...]
>> ?>
>>
>> The w3c wrote this, not me. Is Xerces correct in telling me that the
>> W3C made a mistake in the DTD or is Xerces getting something wrong?
>
>As far as I know <http://www.w3.org/TR/xml-names/#dt-nwf>:
>
>> in a namespace-well-formed document:
>>
>> * No entity names, processing instruction targets, or notation
>> names contain any colons.
>
Hmm... thanks for the pointer. But what has me puzzled is how they
can possibly have a recommendation out there with an invalid
DTD? There's no known errors according to their errata statement
[1], and they've had about 6 years to find them. BTW,
http://validator.w3.org/ says my document validates just fine against
the XHTML Basic 1.0 DTD. Are they not checking against namespaces?
It's a bit worrisome because the xhtml-arch-1.mod [2] is a standard
module for XHTML modularization. So, any DTD extending the various
XHTML modules that include the arch module will fail under
Xerces2. It's hard to believe that the W3C members were boneheaded
enough to write an invalid module that every other extension of XHTML
modularization depends on. The XHTML 1.1 DTD [3] defines ignores the
arch module...
<!ENTITY % xhtml-arch.module "IGNORE" >
<![%xhtml-arch.module;[
<!ENTITY % xhtml-arch.mod
PUBLIC "-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN"
"xhtml-arch-1.mod" >
%xhtml-arch.mod;]]>
That's, apparently, the only reason why it succeeds under Xerces2.
>So Xerces is correct. You could use no namespace processing if you
>don't necessary need it.
>
Based upon your reference, it appears that Xerces2 is correct, but
it's got to be more nuanced than Xerces2 is right and the XHTML
modularization spec leaders are wrong. And turning off namespaces
would entirely defeat the purpose of XHTML modularization,
no? There's got to be a better answer. Anyone?
Jake
[1] http://www.w3.org/2000/12/REC-xhtml-basic-20001219-errata
[2] http://validator.w3.org/sgml-lib/REC-xhtml-basic-20001219/xhtml-arch-1.mod
[3] http://validator.w3.org/sgml-lib/REC-xhtml11-20010531/xhtml11-flat.dtd
>--
>Stanimir
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: DTD and namespaces question
Posted by Stanimir Stamenkov <st...@myrealbox.com>.
/Jacob Kjome/:
> But Xerces gives me the following error in parsing...
>
> [xmlc]
> D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml:26:
> Error: A colon is not allowed in the name 'IS10744:arch' when namespaces
> are enabled.
> [xmlc] Error: Parse of
> "D:\myclasses\Repository\Enhydra\tomcatXHTML\res\page\xhtmlbasic.xhtml"
> failed: org.xml.sax.SAXParseException: A colon is not allowed in the
> name 'IS10744:arch' when namespaces are enabled.
>
> Here's the part of the DTD that it appears to be bombing on (part of the
> flat version of the DTD [1], referenced using a catalog)...
>
> <?IS10744:arch xhtml
[...]
> ?>
>
> The w3c wrote this, not me. Is Xerces correct in telling me that the
> W3C made a mistake in the DTD or is Xerces getting something wrong?
As far as I know <http://www.w3.org/TR/xml-names/#dt-nwf>:
> in a namespace-well-formed document:
>
> * No entity names, processing instruction targets, or notation
> names contain any colons.
So Xerces is correct. You could use no namespace processing if you
don't necessary need it.
--
Stanimir
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org