You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Brett McLaughlin <br...@lutris.com> on 2000/07/20 18:12:32 UTC

Problems with latest Xerces/Xalan Xerces

Hey guys-

  I'm getting some weird errors on the included XML document and XSL
stylesheet - when parsing occurs, it locates the DTD and then barfs on
all the elements. It "feels" like a namespace issue, but I know this
worked fine on earlier versions of Xerces. Any ideas? I'm about to leave
town, so I apologize for the lack of troubleshooting....

Here's the output:

Parser error: Document root element "JavaXML:Book", must match DOCTYPE
root "Boo
k".
http://www.oreilly.com/catalog/javaxml/ grammar not found
Parser error: Element type "JavaXML:Book" must be declared.
Parser error: Element type "JavaXML:Title" must be declared.
Parser error: Element type "JavaXML:Contents" must be declared.
Parser error: Element type "JavaXML:Chapter" must be declared.
[Error] attribute focus not found in element type JavaXML:Chapter
Parser error: Attribute "focus" must be declared for element type
"JavaXML:Chapt
er".
Parser error: Element type "JavaXML:Heading" must be declared.
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Chapter" must be declared.
[Error] attribute focus not found in element type JavaXML:Chapter
Parser error: Attribute "focus" must be declared for element type
"JavaXML:Chapt
er".
Parser error: Element type "JavaXML:Heading" must be declared.
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Chapter" must be declared.
[Error] attribute focus not found in element type JavaXML:Chapter
Parser error: Attribute "focus" must be declared for element type
"JavaXML:Chapt
er".
Parser error: Element type "JavaXML:Heading" must be declared.
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:SectionBreak" must be declared.
Parser error: Element type "JavaXML:Chapter" must be declared.
[Error] attribute focus not found in element type JavaXML:Chapter
Parser error: Attribute "focus" must be declared for element type
"JavaXML:Chapt
er".
Parser error: Element type "JavaXML:Heading" must be declared.
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:Topic" must be declared.
[Error] attribute subSections not found in element type JavaXML:Topic
Parser error: Attribute "subSections" must be declared for element type
"JavaXML
:Topic".
Parser error: Element type "JavaXML:References" must be declared.
Parser error: Element type "JavaXML:Reference" must be declared.
Parser error: Element type "JavaXML:Name" must be declared.
Parser error: Element type "JavaXML:Url" must be declared.
Parser error: Element type "JavaXML:Reference" must be declared.
Parser error: Element type "JavaXML:Name" must be declared.
Parser error: Element type "JavaXML:Url" must be declared.
Parser error: Element type "JavaXML:Copyright" must be declared.
Parser error: Element type "b" must be declared.
Parse of file:C:/foo/contents.xml took 1252 milliseconds

And the XML:

<?xml version="1.0"?>

<!--
  <?xml-stylesheet href="XSL\JavaXML.html.xsl" type="text/xsl"?>
  <?xml-stylesheet href="XSL\JavaXML.wml.xsl" type="text/xsl" 
                   media="wap"?>
  <?cocoon-process type="xslt"?>
-->

<!DOCTYPE JavaXML:Book SYSTEM "DTD\JavaXML.dtd">

<!-- Java and XML -->
<JavaXML:Book xmlns:JavaXML="http://www.oreilly.com/catalog/javaxml/">
 <JavaXML:Title>Java and XML</JavaXML:Title>
 <JavaXML:Contents>

  <JavaXML:Chapter focus="XML">
   <JavaXML:Heading>Introduction</JavaXML:Heading>
   <JavaXML:Topic subSections="7">What Is It?</JavaXML:Topic>
   <JavaXML:Topic subSections="3">How Do I Use It?</JavaXML:Topic>
   <JavaXML:Topic subSections="4">Why Should I Use It?</JavaXML:Topic>
   <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic>
  </JavaXML:Chapter>

  <JavaXML:Chapter focus="XML">
   <JavaXML:Heading>Creating XML</JavaXML:Heading>
   <JavaXML:Topic subSections="0">An XML Document</JavaXML:Topic>
   <JavaXML:Topic subSections="2">The Header</JavaXML:Topic>
   <JavaXML:Topic subSections="6">The Content</JavaXML:Topic>
   <JavaXML:Topic subSections="1">What's Next?</JavaXML:Topic>
  </JavaXML:Chapter>

  <JavaXML:Chapter focus="Java">
   <JavaXML:Heading>Parsing XML</JavaXML:Heading>
   <JavaXML:Topic subSections="3">Getting Prepared</JavaXML:Topic>
   <JavaXML:Topic subSections="3">SAX Readers</JavaXML:Topic>
   <JavaXML:Topic subSections="9">Content Handlers</JavaXML:Topic>
   <JavaXML:Topic subSections="4">Error Handlers</JavaXML:Topic>
   <JavaXML:Topic subSections="0">
     A Better Way to Load a Parser
   </JavaXML:Topic>
   <JavaXML:Topic subSections="4">"Gotcha!"</JavaXML:Topic>
   <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic>
  </JavaXML:Chapter>

  <JavaXML:SectionBreak/>

  <JavaXML:Chapter focus="Java">
   <JavaXML:Heading>Web Publishing Frameworks</JavaXML:Heading>
   <JavaXML:Topic subSections="4">Selecting a Framework</JavaXML:Topic>
   <JavaXML:Topic subSections="4">Installation</JavaXML:Topic>
   <JavaXML:Topic subSections="3">
     Using a Publishing Framework
   </JavaXML:Topic>
   <JavaXML:Topic subSections="2">XSP</JavaXML:Topic>
   <JavaXML:Topic subSections="3">Cocoon 2.0 and Beyond</JavaXML:Topic>
   <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic>
  </JavaXML:Chapter>

 </JavaXML:Contents>

 <JavaXML:References>
  <JavaXML:Reference>
   <JavaXML:Name>The W3C</JavaXML:Name>
   <JavaXML:Url>http://www.w3.org/Style/XSL</JavaXML:Url>
  </JavaXML:Reference>
  <JavaXML:Reference>
   <JavaXML:Name>XSL List</JavaXML:Name>
   <JavaXML:Url>http://www.mulberrytech.com/xsl/xsl-list</JavaXML:Url>
  </JavaXML:Reference>
 </JavaXML:References>

<!--
 <JavaXML:Copyright>
   <center>
     <table cellpadding="0" cellspacing="1" border="1" bgcolor="Black">
       <tr>
         <td align="center">
           <table bgcolor="White" border="2">
             <tr>
               <td>
                 <font size="-1">
                   Copyright O'Reilly and Associates, 2000
                 </font>
               </td>
             </tr>
           </table>
         </td>
       </tr>
     </table>
   </center>
 </JavaXML:Copyright>
-->

 <JavaXML:Copyright>&OReillyCopyright;</JavaXML:Copyright>

</JavaXML:Book>

And the DTD:

<!ELEMENT JavaXML:Book (JavaXML:Title,
                        JavaXML:Contents,
                        JavaXML:Copyright)>
<!ATTLIST JavaXML:Book
      xmlns:JavaXML CDATA #REQUIRED
>
<!ELEMENT JavaXML:Title (#PCDATA)>
<!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)|
                            (JavaXML:Chapter+, JavaXML:SectionBreak?)+)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ATTLIST JavaXML:Chapter
      focus (XML|Java) "Java"
>
<!ELEMENT JavaXML:Heading (#PCDATA)>
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ATTLIST JavaXML:Topic
      subSections CDATA #IMPLIED
>
<!ELEMENT JavaXML:SectionBreak EMPTY>
<!ELEMENT JavaXML:Copyright (#PCDATA)>
<!ENTITY OReillyCopyright SYSTEM 
         "entities/copyright.txt">
<!--
<!ENTITY OReillyCopyright SYSTEM 
         "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">
-->

Thanks - Lots of folks are hitting this on my book, and I'd love to
either get a fix in the book or in Xerces. I'll be checking mail over
the modem this weekend, so I appreciate any help.

-Brett

-- 
Brett McLaughlin, Enhydra Strategist
Lutris Technologies, Inc. 
1200 Pacific Avenue, Suite 300 
Santa Cruz, CA 95060 USA 
http://www.lutris.com
http://www.enhydra.org

Re: Problems with latest Xerces/Xalan Xerces

Posted by Andy Clark <an...@apache.org>.

Elliotte Rusty Harold wrote:
> Validation and namespaces are orthogonal. Namespaces do not change
> the rules for validation. Validation does not change the rules for
> namespaces. To validate a document that uses namespaces you need a
> DTD that describes this document. This DTD must correctly describe
> the elements and attributes in the document as they actually exist.
> In particular:

Yes, but it complicates implementation. We want to avoid duplication
of code by making a general purpose validation engine. However, the
distinction in behavior between validating DTDs and Schemas with
namespaces makes this harder. 

I'm working on a "hack" that will allow the general purpose 
validation engine but work correctly for DTDs when namespaces are
turned on with validation.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Problems with latest Xerces/Xalan Xerces

Posted by Andy Clark <an...@apache.org>.

Ed Staub wrote:
> _Regardless_, I think we're all agreed (right?) that the _useful_ 
> thing to do, regardless of what the specs are trying to say, is to 
> validate using QNames.

I'm not completely agreed. I agree that this is the only way to
do it with DTDs when you don't have a mechanism to "bind" some
namespace to the fake prefixes used. However, in Schema (and
other grammar definition languages that can handle namespace
validation) prefixes are just syntactic convenience and the
validation really occurs on the <uri,localpart> tuple.

The only way to solve this problem is:

  1) Provide a binding mechanism for DTDs but this is not
     standard.
  2) Handle DTD and Schema validation slightly different in
     this case.

I'm working on a fix for Xerces to do the second option.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Problems with latest Xerces/Xalan Xerces

Posted by Eric Ye <er...@locus.apache.org>.

Let's look at a example:

<DOCTYPE abc:root [
  <!ELEMENT root (elm1)>
  <!ELEMENT elm1 EMPTY>
  <!ATTLIST root xmlns:abc #IMPLIED>
]>
<abc:root xmlns:abc="aURI">
    <elm1/>
</abc:root>

The goal is that  a parser with validation and namespace BOTH ON, should
correctly validate this valid XML document(according to XML1.0spec)  and
accept this valid XML file with spit out any error messages, in the same
time, if some application down stream received a SAX2event , say
startElement of abc:root, it should be able to received the resolved QName,
["aUri", "root"].


This said, here are 2 possible solutions :

Solution1:

- always ignore namespace binding when scanning DTD.

- if the XMLValidator find out it's doing validation against a DTD grammar,
it still do namespace binding, but it will use the rawname, abc:root,
(instead of resolved QName, which a pair,[aUri, root]) for resovling
elements and attributes, validating element content,  and all validation
stuff, but when call the next handler down the stream it will pass along the
resolved QName.


Solution 2:

-  Now that DTD validation can NOT work with namespaces on, that's fine, the
scanner and validator just turn off the namespaces binding if a DTD was seen
and validation is on,  no matter how it was set up by the SAX2 feature
originally.

- if originally namespace is set to be on, but turned off later because it
is DTD validation, then the SAXParser (or XMLParser) do the namespace
binding, a side effect is that the SAXParser(or XMLParser) has to maintain a
namspace mapping stack in this case.

_____


Eric Ye * IBM, JTC - Silicon Valley * ericye@locus.apache.org

----- Original Message -----
From: "Edwin Goei" <Ed...@eng.sun.com>
To: <xe...@xml.apache.org>
Sent: Friday, July 21, 2000 2:47 PM
Subject: Re: Problems with latest Xerces/Xalan Xerces


> "Ed Staub" <es...@mediaone.net> wrote:
> >
> > Part of the confusion, I believe, arises out of a question of what is
> being
> > validated:
> > 1) the textual document
> > or
> > 2) the information set in the document
> >
> >
> > If 1), then your reading seems correct.
> >
> > If 2), then the prefix isn't even available to validate, at least
> according
> > to the Information Set draft spec; see e.g.
> > http://www.w3.org/TR/xml-infoset#infoitem.element.
>
> IMHO, the only spec that discusses validation is the XML 1.0 spec, even
the
> Infoset (draft!) spec refers to the XML spec when validation is discussed
so
> it seems clear to me that case #1 is correct.  Which means that there is a
> bug in the current Xerces parser.  See my previous email about a proposed
> fix.
>
> -Edwin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

"Ed Staub" <es...@mediaone.net> wrote:
>
> Part of the confusion, I believe, arises out of a question of what is
being
> validated:
> 1) the textual document
> or
> 2) the information set in the document
>
>
> If 1), then your reading seems correct.
>
> If 2), then the prefix isn't even available to validate, at least
according
> to the Information Set draft spec; see e.g.
> http://www.w3.org/TR/xml-infoset#infoitem.element.

IMHO, the only spec that discusses validation is the XML 1.0 spec, even the
Infoset (draft!) spec refers to the XML spec when validation is discussed so
it seems clear to me that case #1 is correct.  Which means that there is a
bug in the current Xerces parser.  See my previous email about a proposed
fix.

-Edwin

RE: Problems with latest Xerces/Xalan Xerces

Posted by Ed Staub <es...@mediaone.net>.

At July 21, 2000 10:27 AM, Elliotte Rusty Harold wrote:

>Validation and namespaces are orthogonal. Namespaces do not change
>the rules for validation.

I don't think the specs are obvious on this point.

Part of the confusion, I believe, arises out of a question of what is being
validated:
	1) the textual document
	or
	2) the information set in the document

If 1), then your reading seems correct.

If 2), then the prefix isn't even available to validate, at least according
to the Information Set draft spec; see e.g.
http://www.w3.org/TR/xml-infoset#infoitem.element.

_Regardless_, I think we're all agreed (right?) that the _useful_ thing to
do, regardless of what the specs are trying to say, is to validate using
QNames.

-Ed Staub

Re: Problems with latest Xerces/Xalan Xerces

Posted by Elliotte Rusty Harold <el...@metalab.unc.edu>.

At 4:40 PM -0700 7/20/00, Arnaud Le Hors wrote:

>Unfortunately no spec defines how to handle the mix of namespaces and
>DTDs. Even though the Namespaces in XML spec addresses a few
>interactions with DTDs.

You may already have figured this out. If so my apologies for beating 
a dead horse. However, it wasn't 100% clear to me from the posts in 
this thread, but it is an important point so I wanted to try to make 
it very clear.

Validation and namespaces are orthogonal. Namespaces do not change 
the rules for validation. Validation does not change the rules for 
namespaces. To validate a document that uses namespaces you need a 
DTD that describes this document. This DTD must correctly describe 
the elements and attributes in the document as they actually exist. 
In particular:

1. All ELEMENT and ATTLIST declarations must use the names actually 
used in the document including prefixes.

2. All xmlns attributes must be declared.

No new spec is need to describe how to handle validation with 
namespaces. The XML 1.0 spec fully and completely describes how to 
validate a document. Namespaces don't change any of that.

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

----- Original Message -----
From: "Arnaud Le Hors" <le...@us.ibm.com>
To: <xe...@xml.apache.org>
Sent: Thursday, July 20, 2000 4:40 PM
Subject: Re: Problems with latest Xerces/Xalan Xerces

> If we can make this work, great! I think that will make many people's
> life easier.
> But, I'm not sure I agree with Edwin's statement that not doing so
> breaks backward compatibility with XML 1.0. Backward compatibility can
> only be meaningfull when namespaces are off, since XML 1.0 doesn't know
> what they are.
> Unfortunately no spec defines how to handle the mix of namespaces and
> DTDs. Even though the Namespaces in XML spec addresses a few
> interactions with DTDs. BTW no spec defines how to handle the mix of XML
> Schemas and DTDs either...

I see it as two specs: XML and Namespaces.  The XML spec specifies what a
valid document is (http://www.w3.org/TR/REC-xml#sec-conformance).  The
Namespaces spec only refers to validation with respect to the XML spec.
However, the Namespaces spec does specify what a Namespace conformant
document is (http://www.w3.org/TR/REC-xml-names/#Conformance).

The SAX2 "validation" feature corresponds to the XML spec validation.
However, the way I read it, neither the "namespaces" and
"namespace-prefixes" features correspond to Namespace spec conformance, but
rather to SAX2 reporting of events.  I agree that it is confusing.

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by Arnaud Le Hors <le...@us.ibm.com>.

If we can make this work, great! I think that will make many people's
life easier.
But, I'm not sure I agree with Edwin's statement that not doing so
breaks backward compatibility with XML 1.0. Backward compatibility can
only be meaningfull when namespaces are off, since XML 1.0 doesn't know
what they are.
Unfortunately no spec defines how to handle the mix of namespaces and
DTDs. Even though the Namespaces in XML spec addresses a few
interactions with DTDs. BTW no spec defines how to handle the mix of XML
Schemas and DTDs either...
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

Perhaps I was a bit hasty in sending out that last email... I guess it all
depends on whether in SAX2 "Perform Namespace processing" means "check for
Namespaces spec conformance" or not.

So assume you have a document that uses Namespaces and DTDs and you want to
validate.

Case 1: the answer is "No":
    + Advantage: apps are guaranteed to get URI and localName to use for
processing
    + Advantage: easier for users since they do not have to turn off
"Namespaces" feature
    - Disadvantage: there is no standard way to check for namespace
conformance

Case 2: the answer is "Yes":
    + Advantage: standard way to check for Namespaces spec conformance
    - Disadvantage: apps may not get URI and localName
    - Disadvantage: users must turn off "Namespaces" feature

Note that since the document uses Namespaces, you probably want URI and
localName info.  In the "Yes" #2 case, it is possible to compatibly provide
the URI and localName info along with the qName, but that means the parser
must do namespace processing to do so.  This would make apps that do not use
namespaces slower so I do not think this is what was intended.

OK, so after this more careful analysis, I would still lean towards my
original opinion.  We could always add a separate feature to control
checking for Namespace spec conformance.

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

> Just for the argumentative purpose.
>
> As I read the SAX2 namespaces, here is the URL:
> http://www.megginson.com/SAX/Java/namespaces.html
>
> It looks like that SAX2 says if you set feature
> "http://xml.org/sax/features/namespaces " to be true, means "Perfom
> Namespace Processing", which in turn  means "with Namespace processing,
each
> element and attribute has a two-part name, consisting of an optional URI
> (equivalent to a Java or Perl package name) followed by a local name which
> may not contain a colon.",  then <foo:bar.barz/> would not be able to be
> processed.

Yes, but does "Perform Namespace processing" mean the same thing as "check
for Namespaces spec conformance"?  My opinion would be no, it only affects
the events that get reported through the API.  There would be a few ways to
fix this with respect to SAX2.  This is definitely an area of SAX2 that is
confusing.

I would say that XML 1.0 and Namespaces 1.0 specs should be followed first
and that SAX2 was designed to allow apps to follow these two specs.  If
there are conflicts then SAX2 needs to be changed.  In the one colon case,
this can be done compatibly.  In the rare two or more colon case, there are
multiple possible fixes, but that happens rarely.

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by Eric Ye <er...@locus.apache.org>.

> After thinking about it some more and looking at the SAX2 spec, the
> "namespace" feature does not talk about conformance to the Namespaces
spec,
> it only talks about what the SAX2 event interfaces should do.  It is
> discussed also with the "namespace-prefixes" feature.  As I read it, both
> features have to do with SAX2 event reporting and do not specify checking
> for Namespace spec conformance.  So even if a document had an element with
> multiple colons "<foo:bar:baz/>", it should still be valid even though it
> would not be Namespace spec conformant.

Just for the argumentative purpose.

As I read the SAX2 namespaces, here is the URL:
http://www.megginson.com/SAX/Java/namespaces.html

It looks like that SAX2 says if you set feature
"http://xml.org/sax/features/namespaces " to be true, means "Perfom
Namespace Processing", which in turn  means "with Namespace processing, each
element and attribute has a two-part name, consisting of an optional URI
(equivalent to a Java or Perl package name) followed by a local name which
may not contain a colon.",  then <foo:bar.barz/> would not be able to be
processed.

_____


Eric Ye * IBM, JTC - Silicon Valley * ericye@locus.apache.org

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

Edwin Goei wrote:
> 
> So I would propose the following:
> 
>     + Move the SAX2 "namespaces" feature to "SAX features" from
> "General"
> 
>     + Change the behavior of the parser so that it validates docs like
> #1 above regardless of the SAX2 "namespaces" feature value.
> 
>     + Optionally add a new general feature to "check for Namespaces spec
> conformance".

OK, after giving it even more thought, I would like to withdraw most of
this proposal.  The most important thing is to enable apps to process
"valid" (XML REC kind) and "Namespace conformant" XML docs that use DTDs
with SAX2.  This is not currently possible with Xerces.  The issue of
whether the SAX "namespaces" feature also causes the parser to check for
"Namespace REC conformance" is a separate issue and furthermore, it is
also desireable for this feature to also control "Namespace conformance"
(or else SAX2 does not make as much sense).

So to summarize, given an XML doc that uses namespaces and a DTD which
is both "valid" and "Namespace conformant", it should be possible to
process and validate this document using SAX2.  This implies that the
SAX "namespaces" features may be set to "true", the default value.

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

Andy Clark wrote:
> 
> Edwin Goei wrote:
> > This is where the confusion lies.  Namespaces spec conformance is a
> > separate issue.  So if a document has an element such as
> > "<foo:bar:baz/>", it should still be valid.  SAX2 should be compatibly
> > modified to accomodate it even if "namespaces" is true.
> 
> But an error is an error, whether it is a document validation error
> or a namespace validation error. And your example violates the NS
> validation rules when namespaces is true.

I would disagree with you here.  My claim is that they are separate
errors and are orthogonal (as others have stated).  "Document
validation" is covered in the XML REC while "namespace validation" as
you refer to it, which is equivalent to "Namespace REC conformance" is
covered in the Namespaces REC.  Note that the Namespace REC does not use
the term "valid" except in the XML REC "document validation" sense,
probably on purpose.

Furthermore, I claim the SAX spec states that the
"http://xml.org/sax/features/validation" feature (which I abbreviate as
SAX2 "validation") refers to the XML 1.0 REC sense of "document
validation" and that there is nothing currently in SAX2 that controls
"Namespace REC conformance".  See my earlier email for why the SAX2
"namespaces" feature does not work for this.

So therefore, if SAX2 "validation" is "true" and a parser parses a valid
(in the XML REC sense) XML document with at DTD which is "Namespace REC
conformant", it should be possible to validate that document and to
process it using the SAX2 namespace interfaces without error handlers
being called nor exceptions thrown.  This behavior is not currently
possible with Xerces.

Also since there is not a standard way to control "Namespace REC
conformance", a new Xerces specific feature could be added to control
this separately, but the details need to be worked out, such as how
non-conformance is reported to the app.  See my earlier email on a
proposed fix.

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by Andy Clark <an...@apache.org>.

Edwin Goei wrote:
> This is where the confusion lies.  Namespaces spec conformance is a
> separate issue.  So if a document has an element such as
> "<foo:bar:baz/>", it should still be valid.  SAX2 should be compatibly
> modified to accomodate it even if "namespaces" is true.

But an error is an error, whether it is a document validation error
or a namespace validation error. And your example violates the NS
validation rules when namespaces is true.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

After more thought, I would like to refine my opinion some more and make
a proposal.

Edwin Goei wrote:
> 
> Actually, I don't think it should be necessary to turn SAX2 "namespaces" [sic]
> support off.  It should work either way as I've stated in an earlier email.

Not only is it not necessary to turn it off, but the "namespaces"
feature should  be on (ie. true) for the most common case.  Here's why. 
First there are two cases SAX2 and DOM.  To start with we have the very
common case of an XML document that uses namespaces and the app wants to
validate it with a DTD (1).

Taking the SAX2 case first.  Suppose that the SAX2 "namespaces" feature
is set to "false".  Then according to SAX2, the
ContentHandler.startElement(...) method would only guarantee that the
application receives the "qName" argument.  But, the XML document uses
namespaces so the app would normally like to get the "namespaceURI" and
"localName" arguments as well or else it would have to do namespace
processing itself which is non-sensical.  The alternative would be for
the parser to provide the needed arguments and so the parser would have
to perform namespace processing.  This clearly does not make sense. 
Therefore, unless the app is only interested in validation, the
"namespaces" feature should normally be "true".

SAX2 came out after the XML and Namespaces specs specifically to support
both so I think that validating docs like #1 above should be possible. 
(BTW, the crimson parser handles this case.)

For the DOM case, the SAX2 "namespaces" feature is not part of the spec
and just happens to be used only in the Xerces DOMParser.  The web page
http://xml.apache.org/xerces-j/features.html says that it is a "General
Feature" and not a "SAX feature".  I haven't looked at the code, but
this feature probably was overloaded and used to also mean "check for
Namespaces spec conformance".

This is where the confusion lies.  Namespaces spec conformance is a
separate issue.  So if a document has an element such as
"<foo:bar:baz/>", it should still be valid.  SAX2 should be compatibly
modified to accomodate it even if "namespaces" is true.

So I would propose the following:

    + Move the SAX2 "namespaces" feature to "SAX features" from
"General"

    + Change the behavior of the parser so that it validates docs like
#1 above regardless of the SAX2 "namespaces" feature value.

    + Optionally add a new general feature to "check for Namespaces spec
conformance".

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

Brett McLaughlin wrote:
>
> Andy Clark wrote:
> >
> > Brett McLaughlin wrote:
> > > I'm getting some weird errors on the included XML document and XSL
> > > stylesheet - when parsing occurs, it locates the DTD and then barfs
> > > on all the elements. It "feels" like a namespace issue, but I know
> > > this worked fine on earlier versions of Xerces. Any ideas? I'm about
> > > to leave town, so I apologize for the lack of troubleshooting....
> >
> > Turn namespace support off. You may be using namespaces in your
> > document but if you try to use them with DTDs, you're asking for
> > trouble. And here's the reason...
>
> Thanks for the reply Andy-
>
>   Yeah, I knew about the namespace thing (had to document it in the book
> ;-) ). This actually occurs running Xalan from the command line:

Actually, I don't think it should be necessary to turn SAX2 "namespace"
support off.  It should work either way as I've stated in an earlier email.

After thinking about it some more and looking at the SAX2 spec, the
"namespace" feature does not talk about conformance to the Namespaces spec,
it only talks about what the SAX2 event interfaces should do.  It is
discussed also with the "namespace-prefixes" feature.  As I read it, both
features have to do with SAX2 event reporting and do not specify checking
for Namespace spec conformance.  So even if a document had an element with
multiple colons "<foo:bar:baz/>", it should still be valid even though it
would not be Namespace spec conformant.

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by Brett McLaughlin <br...@lutris.com>.


Andy Clark wrote:
> 
> Brett McLaughlin wrote:
> > I'm getting some weird errors on the included XML document and XSL
> > stylesheet - when parsing occurs, it locates the DTD and then barfs
> > on all the elements. It "feels" like a namespace issue, but I know
> > this worked fine on earlier versions of Xerces. Any ideas? I'm about
> > to leave town, so I apologize for the lack of troubleshooting....
> 
> Turn namespace support off. You may be using namespaces in your
> document but if you try to use them with DTDs, you're asking for
> trouble. And here's the reason...

Thanks for the reply Andy-

  Yeah, I knew about the namespace thing (had to document it in the book
;-) ). This actually occurs running Xalan from the command line:

java org.apache.xalan.Process -IN contents.xml -XSL XSL/JavaXML.html.xsl
-OUT  contents.html

I know I can report this to Xalan, I just thought I would see if it
looked familiar to you guys first. It may be that they have namespaces
turned o by default which is a problem.

Thanks,
Brett

> 
> When namespace processing is on, the parser separates all elements
> and attributes found in the document into a <uri, localpart> tuple.
> For validation, both the uri and localpart must match in order for
> the content to be valid. However, when the grammar is loaded from
> the DTD, there's no way for the parser to know what namespace the
> "fake" prefixes specified in the DTD should be bound to. Does
> this make sense?
> 
> With namespace support turned off, all element and attribute
> names will be considered a single name and not a tuple. And since
> namespace processing is on by default, you need to turn it off if
> you plan on using namespaces in a document whose grammar is
> specified in a DTD.
> 
> There might be some kind of custom solution we could do in the
> future when we can cache grammars so that we can pre-load DTD
> grammars and "bind" them to a namespace. This would allow you
> to do namespace validation with documents that have a DTD. But
> we don't have this yet, of course.
> 
> Also, don't use DOS paths in your document. Only use URIs.
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

-- 
Brett McLaughlin, Enhydra Strategist
Lutris Technologies, Inc. 
1200 Pacific Avenue, Suite 300 
Santa Cruz, CA 95060 USA 
http://www.lutris.com
http://www.enhydra.org

Re: Problems with latest Xerces/Xalan Xerces -- FIXED!!!

Posted by Eric Ye <er...@locus.apache.org>.

Just found a bug in XMLValidator that will barf on prefixed attributes other
than "xmlns:..", "xml:...", such as "foo:attr1", even if it is already
defined in DTD.
 fix is already in CVS.
_____


Eric Ye * IBM, JTC - Silicon Valley * ericye@locus.apache.org

----- Original Message -----
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Wednesday, July 26, 2000 2:01 PM
Subject: Re: Problems with latest Xerces/Xalan Xerces -- FIXED!!!


> Edwin Goei wrote:
> > Thanks for working on this.  I checked out the latest sources and got a
> > different error.  This is with SAX "validation" true and "namespaces"
> > the default value of true.  The error I got was:
>
> Yeah, my previous fix only worked on DTD documents that didn't use
> namespaces at all. I just checked in some code before posting this
> message that looks like it fixes all cases of using DTD validation
> with namespaces.
>
> I tried your sample file and it seems to validate fine. Let me
> know if you find any more problems.
>
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>

Re: Problems with latest Xerces/Xalan Xerces -- FIXED!!!

Posted by Andy Clark <an...@apache.org>.

Edwin Goei wrote:
> Thanks for working on this.  I checked out the latest sources and got a
> different error.  This is with SAX "validation" true and "namespaces"
> the default value of true.  The error I got was:

Yeah, my previous fix only worked on DTD documents that didn't use
namespaces at all. I just checked in some code before posting this
message that looks like it fixes all cases of using DTD validation
with namespaces.

I tried your sample file and it seems to validate fine. Let me
know if you find any more problems.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Problems with latest Xerces/Xalan Xerces -- FIXED!!!

Posted by Edwin Goei <Ed...@eng.sun.com>.

Andy Clark wrote:
> 
> Okay, I finally fixed the problem involving validating DTDs when
> namespaces are turned on. PLEASE check out the latest from CVS
> and pound on it to make sure that I didn't hose something else
> in the process!

Thanks for working on this.  I checked out the latest sources and got a
different error.  This is with SAX "validation" true and "namespaces"
the default value of true.  The error I got was:

"Element type "foo:root" must be declared."

It seems that the element is already declared in the DTD.  Below is my
simple test document.

-Edwin

<?xml version="1.0"?>

<!DOCTYPE foo:root [

<!ELEMENT foo:root (foo:element1, element2)>
<!ATTLIST foo:root
    xmlns:foo CDATA #REQUIRED
>

<!ELEMENT foo:element1 (#PCDATA)>

<!ELEMENT element2 (#PCDATA)>
]>

<foo:root xmlns:foo="http://www.foo.com/foo">
    <foo:element1>test</foo:element1>
    <element2>test2</element2>
</foo:root>

Re: Problems with latest Xerces/Xalan Xerces -- NOT QUITE :(

Posted by Andy Clark <an...@apache.org>.

Andy Clark wrote:
> Okay, I finally fixed the problem involving validating DTDs when
> namespaces are turned on. PLEASE check out the latest from CVS
> and pound on it to make sure that I didn't hose something else
> in the process!

Okay, a little more work needs to be done. The namespace issue
with DTDs works fine as long as you aren't adding the namespace
prefixes to the element/attribute names in the DTD. So now I 
have to get that working, too. 

Also, I think that there might be some work needed to verify
that xml:lang and xml:space still work in these situations.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Problems with latest Xerces/Xalan Xerces -- FIXED!!!

Posted by Andy Clark <an...@apache.org>.

Okay, I finally fixed the problem involving validating DTDs when
namespaces are turned on. PLEASE check out the latest from CVS
and pound on it to make sure that I didn't hose something else
in the process!

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Problems with latest Xerces/Xalan Xerces

Posted by Andy Clark <an...@apache.org>.

Andy Clark wrote:
> Okay, this can be done. Not a problem. I just have to check out
> the validation code and make sure that it's as easy as I think.

Well, it's not as easy as I thought. But I'm working on it.
I'll keep you posted.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

Andy Clark wrote:
> 
> Edwin Goei wrote:
> > The XML spec does not mention namespaces at all, only validating and
> > non-validating processors.  So the way I read it, DTDs are not namespace
> > aware and thus DTD validation should also ignore namespaces.  Therefore,
> > the following XML file should be valid with a conformant validating
> > processor regardless if namespace features in SAX2 are turned on or
> > off.  When I tried this with a version of Xerces 1.x, I got a validation
> > error when namespaces are set to true.
> 
> Let me rephrase what you've said so that we can agree that we're
> talking about the same thing. When the document specifies a DTD
> and validation is turned on, then the parser is supposed to ignore
> the namespace bindings and validate based on the QName (or rawname).
> Right?

Yes, although as Eric Ye points out, it should be treated as a XML 1.0
"Name" for the purposes of validation.  So what happens to an element
type such as "foo:bar:baz" with two colons?  I would say that the
containing document would still be a valid document because of the XML
1.0 spec, but it would not be namespace conformant according to the
Namespace spec.  So the document would still be valid, regardless of the
SAX2 "namespaces" feature.  But, this might conflict with SAX2
interfaces -- hmmm, interesting.

In any case, the one colon case occurs commonly and can be fixed
compatibly with SAX2.  The two or more colon case is very infrequent.

> 
> Okay, this can be done. Not a problem. I just have to check out
> the validation code and make sure that it's as easy as I think.
> 
> Basically, this problem is caused by the fact that the content
> model validation code is generalized to be used for both DTDs
> and Schemas. So a special flag must be set when the content
> model validators are constructed from DTDs. Otherwise, the
> validators are asked to validate content from the namespace
> uri and localpart tuple.
> 
> This is just a personal opinion but, to me, DTDs and Schemas are
> just different syntaxes for representing document grammars. It
> would be great if there was some mechanism for binding element
> and attribute declarations in a DTD to a namespace so that I can
> define my grammars using the DTD syntax and still take advantage
> of namespace validation. Ahhh... if only it weren't just a
> dream.

Yes, ideally, but Namespaces came after DTDs (part of XML 1.0)
unfortunately.

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/20/00 3:47 PM, Andy Clark at andyc@apache.org wrote:

> This is just a personal opinion but, to me, DTDs and Schemas are
> just different syntaxes for representing document grammars. It
> would be great if there was some mechanism for binding element
> and attribute declarations in a DTD to a namespace so that I can
> define my grammars using the DTD syntax and still take advantage
> of namespace validation. Ahhh... if only it weren't just a
> dream.

Check out Relax -- it has a way to add data typing to DTDs -- and would be
an excellent candiate for a subclass of DTDValidator in the ng parser imo.

.duncan

Re: Problems with latest Xerces/Xalan Xerces

Posted by Andy Clark <an...@apache.org>.

Edwin Goei wrote:
> The XML spec does not mention namespaces at all, only validating and
> non-validating processors.  So the way I read it, DTDs are not namespace
> aware and thus DTD validation should also ignore namespaces.  Therefore,
> the following XML file should be valid with a conformant validating
> processor regardless if namespace features in SAX2 are turned on or
> off.  When I tried this with a version of Xerces 1.x, I got a validation
> error when namespaces are set to true.

Let me rephrase what you've said so that we can agree that we're
talking about the same thing. When the document specifies a DTD
and validation is turned on, then the parser is supposed to ignore
the namespace bindings and validate based on the QName (or rawname).
Right?

Okay, this can be done. Not a problem. I just have to check out
the validation code and make sure that it's as easy as I think.

Basically, this problem is caused by the fact that the content
model validation code is generalized to be used for both DTDs
and Schemas. So a special flag must be set when the content
model validators are constructed from DTDs. Otherwise, the
validators are asked to validate content from the namespace
uri and localpart tuple.

This is just a personal opinion but, to me, DTDs and Schemas are 
just different syntaxes for representing document grammars. It 
would be great if there was some mechanism for binding element 
and attribute declarations in a DTD to a namespace so that I can 
define my grammars using the DTD syntax and still take advantage 
of namespace validation. Ahhh... if only it weren't just a 
dream.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Problems with latest Xerces/Xalan Xerces

Posted by Edwin Goei <Ed...@eng.sun.com>.

Andy Clark wrote:
> 
> Brett McLaughlin wrote:
> > I'm getting some weird errors on the included XML document and XSL
> > stylesheet - when parsing occurs, it locates the DTD and then barfs
> > on all the elements. It "feels" like a namespace issue, but I know
> > this worked fine on earlier versions of Xerces. Any ideas? I'm about
> > to leave town, so I apologize for the lack of troubleshooting....
> 
> Turn namespace support off. You may be using namespaces in your
> document but if you try to use them with DTDs, you're asking for
> trouble. And here's the reason...
> 
> When namespace processing is on, the parser separates all elements
> and attributes found in the document into a <uri, localpart> tuple.
> For validation, both the uri and localpart must match in order for
> the content to be valid. However, when the grammar is loaded from
> the DTD, there's no way for the parser to know what namespace the
> "fake" prefixes specified in the DTD should be bound to. Does
> this make sense?

Yes, however I believe this breaks compatibility with the XML 1.0 spec. 
See below.

> 
> With namespace support turned off, all element and attribute
> names will be considered a single name and not a tuple. And since
> namespace processing is on by default, you need to turn it off if
> you plan on using namespaces in a document whose grammar is
> specified in a DTD.

The XML spec does not mention namespaces at all, only validating and
non-validating processors.  So the way I read it, DTDs are not namespace
aware and thus DTD validation should also ignore namespaces.  Therefore,
the following XML file should be valid with a conformant validating
processor regardless if namespace features in SAX2 are turned on or
off.  When I tried this with a version of Xerces 1.x, I got a validation
error when namespaces are set to true.

----- Beginning of included file -----
<?xml version="1.0"?>

<!DOCTYPE foo:root [

<!ELEMENT foo:root (foo:element1, foo:element2)>
<!ATTLIST foo:root
    xmlns:foo CDATA #FIXED "http://www.foo.com/foo"
>

<!ELEMENT foo:element1 (#PCDATA)>

<!ELEMENT foo:element2 (#PCDATA)>

]>

<foo:root>
    <foo:element1>test</foo:element1>
    <foo:element2>test2</foo:element2>
</foo:root>
----- End of included file -----

-Edwin

Re: Problems with latest Xerces/Xalan Xerces

Posted by Andy Clark <an...@apache.org>.

Brett McLaughlin wrote:
> I'm getting some weird errors on the included XML document and XSL
> stylesheet - when parsing occurs, it locates the DTD and then barfs 
> on all the elements. It "feels" like a namespace issue, but I know 
> this worked fine on earlier versions of Xerces. Any ideas? I'm about 
> to leave town, so I apologize for the lack of troubleshooting....

Turn namespace support off. You may be using namespaces in your
document but if you try to use them with DTDs, you're asking for 
trouble. And here's the reason...

When namespace processing is on, the parser separates all elements
and attributes found in the document into a <uri, localpart> tuple.
For validation, both the uri and localpart must match in order for
the content to be valid. However, when the grammar is loaded from
the DTD, there's no way for the parser to know what namespace the
"fake" prefixes specified in the DTD should be bound to. Does 
this make sense?

With namespace support turned off, all element and attribute
names will be considered a single name and not a tuple. And since
namespace processing is on by default, you need to turn it off if
you plan on using namespaces in a document whose grammar is
specified in a DTD.

There might be some kind of custom solution we could do in the
future when we can cache grammars so that we can pre-load DTD
grammars and "bind" them to a namespace. This would allow you
to do namespace validation with documents that have a DTD. But
we don't have this yet, of course.

Also, don't use DOS paths in your document. Only use URIs.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

RE: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Yoga Balaji <y....@1internet.com>.

Andy. Actually I tried with small p also, but it didn't work initially. Now
my problem is solved since I added the following line.
parser.setFeature("http://apache.org/xml/features/allow-java-encodings",true
);
              and
done mapping Cp1252 with Windows-1252 in MIME2Java.java source.  I'm using
DOMParserWrapper parser XERCES 1.1.2. It's working fine now with
Windows-1252.
Thank u very much Andy for yr Support.

Jerzy. I tried yr DOMParser (in all the versions of XERCES) and it works
fine for Cp1252 but, it doesn't resolve Entity References. The main reason I
replaced java's JAXP with XERCES is to resolve Entity References
automatically by the parser
Thanks a lot Jerzy for yr  response.

Yoga Balaji wrote:
>         s_enchash.put("WINDOWS-1252",   "CP1252");
>         s_revhash.put("CP1252", "WINDOWS-1252");

The mapping to "Cp1252" is CASE SENSITIVE because it is used
to dynamically locate an appropriate decoder class. So use
a lower-case 'p' and let me know if that works for you. If
not, let me know what version of Java you are using and on
what platform because Java JVMs are not required to include
decoders besides the ones for Unicode.

--
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Andy Clark <an...@apache.org>.

Yoga Balaji wrote:
>         s_enchash.put("WINDOWS-1252",   "CP1252");
>         s_revhash.put("CP1252", "WINDOWS-1252");

The mapping to "Cp1252" is CASE SENSITIVE because it is used
to dynamically locate an appropriate decoder class. So use
a lower-case 'p' and let me know if that works for you. If
not, let me know what version of Java you are using and on
what platform because Java JVMs are not required to include
decoders besides the ones for Unicode.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

RE: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Yoga Balaji <y....@1internet.com>.

Andy. Thanks for yr response.

I prefer Solution #3.
I followed yr instructions and added the following code in MIMEJava.java

        s_enchash.put("WINDOWS-1252",   "CP1252");
        s_revhash.put("CP1252", "WINDOWS-1252");

But it still gives the same error. I tried both with upper & lower cases
also.
The URL in yr solution #2 is not there.
Thanks. Pls help!

Yoga Balaji wrote:
> I'm trying to parse a XML file from MSNBC (daily news) and store
> the data in the DB. I'm using Xerces parser to parse the XML file
> I rcv. I'm getting the following error when I run my program.
> Without "encoding=Windows-1252" it works fine. I donno how to
> resolve this problem.

The problem is that "Windows-1252" is *not* a valid encoding name.
The XML specification states that all encoding names must be IANA
names. However, "Windows-1252" is not. Unfortunately, when the
Microsoft XML parser writes an XML document, it automatically
includes this encoding.

There are several solutions:

1) Convert all of the incoming documents from "Windows-1252"
   encoding to a proper encoding. Make sure to modify the
   encoding line at the top of the file to reflect the change.

2) Modify the encoding line in all of your files to be
   "Cp1252" (case is important this time). Then turn on the
   following feature in the parser:

     http://apache.org/xml/features/allow-java-encodings

   Please note that your documents won't be portable in much
   the same way that using the "Windows-1252" encoding name
   doesn't work everywhere.

3) Modify the MIME2Java.java source file to include a mapping
   for "Windows-1252" to "Cp1252". Recompile and rebuild the
   Jar file. Use the new Jar file and you're done.

--
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Mike Pogue <mp...@apache.org>.

What makes you think that MS doesn't want that?

"Use MS Windows, and all of your problems are solved!  Oh, the XML documents you create
can't be used on other platforms?  Ooops!  Sorry!"  :-)

Right now, Xerces counts on the underlying JDK for all of its conversion support
(allow-java-encodings turns that on, so that anything the underlying JDK supports is
allowed, even though not necessarily portable).  

If the JDK supports the Windows encoding, then "allow-java-encodings" really implies
"allow-windows-encodings".  If the JDK does NOT support the Windows encoding, then to
implement "allow-windows-encodings" would require adding a  converter, I believe.

Right now, all the converters are in sun.io, if I remember correctly, so JDK implementors
are not required to implement the ones that Sun has.  (I think this was a mistake in the
long run, but I can certainly understand why it was done!)

So, for TRULY PORTABLE XML, you must use UTF-8.  I encourage everybody to use UTF-8,
because everybody is required to implement it in their parser, and also it's supported
nicely by all JDK's. 

Using a Windows-specific encoding is only asking for your XML to be non-portable!  "Just
say no!"

Mike

Ed Staub wrote:
> 
> We have the feature "http://apache.org/xml/features/allow-java-encodings".
> Might we want to also have
> "http://apache.org/xml/features/allow-Windows-encodings"?
> 
> Has anyone approached the Windows XML team about this?
> It seems like a really stupid thing for them to do, strategically; I suspect
> it's just a decision that was made by default - "we just pass on what the
> operating system (Windows) tells us" or similar.
> 
> I don't think they'd want to make Windows an "XML Roach Motel": "Documents
> check in... they _don't check out!".
> 
> -Ed
> 
> -----Original Message-----
> From: Mike Pogue [mailto:mpogue@apache.org]
> Sent: Friday, July 21, 2000 8:48 PM
> To: xerces-j-dev@xml.apache.org
> Subject: Re: XERCES Windows-1252 encoding problem!!! Pls Help
> 
> I agree with Andy on this one.  In the XML world, strict is best, because it
> maximizes
> portability.
> 
> You wouldn't believe how many times people ask for a "near-XML parser",
> though, just so
> their particular platform works nicer (e.g. I've had people ask to allow
> mal-formed XML,
> because it's easier for them to create in their text editor).
> 
> "Just say no", is my opinion...
> 
> Mike
> 
> Andy Clark wrote:
> >
> > James Duncan Davidson wrote:
> > > Any thought that the parser should be lazy in what it accepts? Kind of
> like
> > > HTTP clients are supposed to be strict in what they send and servers are
> > > supposed to be loose in what they accept? Or is that just opening the
> door
> > > for more slop?
> >
> > Opens the door. ;) I like stricter grammars. Schema is too sloppy
> > for my tastes.
> >
> > --
> > Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

RE: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Ed Staub <es...@mediaone.net>.

We have the feature "http://apache.org/xml/features/allow-java-encodings".
Might we want to also have
"http://apache.org/xml/features/allow-Windows-encodings"?

Has anyone approached the Windows XML team about this?
It seems like a really stupid thing for them to do, strategically; I suspect
it's just a decision that was made by default - "we just pass on what the
operating system (Windows) tells us" or similar.

I don't think they'd want to make Windows an "XML Roach Motel": "Documents
check in... they _don't check out!".

-Ed

-----Original Message-----
From: Mike Pogue [mailto:mpogue@apache.org]
Sent: Friday, July 21, 2000 8:48 PM
To: xerces-j-dev@xml.apache.org
Subject: Re: XERCES Windows-1252 encoding problem!!! Pls Help

I agree with Andy on this one.  In the XML world, strict is best, because it
maximizes
portability.

You wouldn't believe how many times people ask for a "near-XML parser",
though, just so
their particular platform works nicer (e.g. I've had people ask to allow
mal-formed XML,
because it's easier for them to create in their text editor).

"Just say no", is my opinion...

Mike

Andy Clark wrote:
>
> James Duncan Davidson wrote:
> > Any thought that the parser should be lazy in what it accepts? Kind of
like
> > HTTP clients are supposed to be strict in what they send and servers are
> > supposed to be loose in what they accept? Or is that just opening the
door
> > for more slop?
>
> Opens the door. ;) I like stricter grammars. Schema is too sloppy
> for my tastes.
>
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Mike Pogue <mp...@apache.org>.

I agree with Andy on this one.  In the XML world, strict is best, because it maximizes
portability.

You wouldn't believe how many times people ask for a "near-XML parser", though, just so
their particular platform works nicer (e.g. I've had people ask to allow mal-formed XML,
because it's easier for them to create in their text editor).

"Just say no", is my opinion...

Mike

Andy Clark wrote:
> 
> James Duncan Davidson wrote:
> > Any thought that the parser should be lazy in what it accepts? Kind of like
> > HTTP clients are supposed to be strict in what they send and servers are
> > supposed to be loose in what they accept? Or is that just opening the door
> > for more slop?
> 
> Opens the door. ;) I like stricter grammars. Schema is too sloppy
> for my tastes.
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Edwin Goei <Ed...@eng.sun.com>.

"Andy Clark" <an...@apache.org> wrote:
> James Duncan Davidson wrote:
> > Any thought that the parser should be lazy in what it accepts? Kind of
like
> > HTTP clients are supposed to be strict in what they send and servers are
> > supposed to be loose in what they accept? Or is that just opening the
door
> > for more slop?
>
> Opens the door. ;) I like stricter grammars. Schema is too sloppy
> for my tastes.

Ideally it would be a user configurable option, because sometimes you want
one or the other.  Sort of like the "strict" and the "transitional" XHTML
DTDs.  This makes the implementation more difficult though.

Re: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Andy Clark <an...@apache.org>.

James Duncan Davidson wrote:
> Any thought that the parser should be lazy in what it accepts? Kind of like
> HTTP clients are supposed to be strict in what they send and servers are
> supposed to be loose in what they accept? Or is that just opening the door
> for more slop?

Opens the door. ;) I like stricter grammars. Schema is too sloppy
for my tastes.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/20/00 10:25 AM, Andy Clark at andyc@apache.org wrote:

> However, "Windows-1252" is not. Unfortunately, when the
> Microsoft XML parser writes an XML document, it automatically
> includes this encoding.

You just gotta hate that kind of thing. :(

Any thought that the parser should be lazy in what it accepts? Kind of like
HTTP clients are supposed to be strict in what they send and servers are
supposed to be loose in what they accept? Or is that just opening the door
for more slop?

.duncan

RE: Errata to my last (today) post

Posted by Yoga Balaji <y....@1internet.com>.

Hi Jerzy. I tried with Xerces111, it still gives me the same problem. I
tried with both encoding = Cp1252 and encoding = windows-1252. Pls help me
by giving more details on that. Thanks.

I just check my program with Xerces103 and Xerces111 and it works
fine. This means for me, that something is wrong in 1.1.2 version or I
am missing some important stuff.

Jerzy
--
+--------------------------------+
|         Jerzy Puchala          |
+--------------------------------+
|       jerzypuc@scdi.com        |
+--------------------------------+


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Who I did validation (was: Errata...) [long]

Posted by Jerzy Puchala <je...@scdi.com>.

First I just started play with Xerces parser.
Like I wrote in previous letter this version is NOT working with 1.1.2
version of Xercers.
There I will include part of code which is working on my machine. 

<MAJOR PROGRAM CLASS>
import com.sun.javadoc.*;

import java.util.Collection;
import java.io.IOException;

import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import org.apache.xerces.parsers.DOMParser;

import org.xml.sax.SAXException;

public class MyClass2 {
	
public static void main(String arg[]) throws Exception {
  MyClass2 myClass = new MyClass2(arg[0]);
}
	
public MyClass2(String a){
  getWork(a);
}
	
private void getWork(String DDuri) {
  try {
    DOMParser parser = new DOMParser();
    try {
      parser.setErrorHandler(new MyErrorHandler());
 
parser.setFeature("http://apache.org/xml/features/allow-java-encodings",
true);

     parser.setFeature("http://xml.org/sax/features/validation",
true);
				
     parser.parse(DDuri);
     Document doc = parser.getDocument();

//here do something with your document, this is not subject of this
//post 
    }
    catch (SAXException e) {
      e.printStackTrace();
    }
    catch (IOException e) {
      e.printStackTrace();
    }

  }
	
} 
</MAJOR PROGRAM CLASS> 

<MyErrorHandler clas>
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.SAXNotRecognizedException;
import org.xml.sax.SAXNotSupportedException;


public class MyErrorHandler implements ErrorHandler {
	
/** Warning. */
public void warning(SAXParseException ex) {
  System.err.println("[Warning] "+
       getLocationString(ex)+": "+
           ex.getMessage());
}
	
/** Error. */
public void error(SAXParseException ex) {
  System.err.println("[Error] "+
       getLocationString(ex)+": "+
           ex.getMessage());
}
	
/** Fatal error. */
public void fatalError(SAXParseException ex) throws SAXException {
System.err.println("[Fatal Error] "+
    getLocationString(ex)+": "+
        ex.getMessage());
throw ex;
}
	
/** Returns a string of the location. */
private String getLocationString(SAXParseException ex) {
StringBuffer str = new StringBuffer();
		
String systemId = ex.getSystemId();
if (systemId != null) {
  int index = systemId.lastIndexOf('/');
  if (index != -1)
    systemId = systemId.substring(index + 1);
    str.append(systemId);
}
str.append(':');
str.append(ex.getLineNumber());
str.append(':');
str.append(ex.getColumnNumber());
		
return str.toString();
} // getLocationString(SAXParseException):String
}
</MyErrorHandler class>

In additon exact look of line with encoding in my xml is:

<?xml version="1.0" encoding="Cp1252"?>

This line is generated by Deployment Descriptor Editor from Inprise
Application Server, adn I can not send whole one. But this was line
which was responsible for error.

I do not made any mapings and changing in any files. I work on Windows
NT 40 Workstation. I have jdk1.2.2_6.

I have to cut big part of my program from this letter but I belive
that I do not cat out to much. 

Once again I am not work with Xerces long time, and maby this problem
can be solved better way. In addition if there are any lines in the
code which are no nessesery - I am sorry (a specialy in import can be
be too much).

I hope that whis will help somebody. Sorry for my poor english.

Jerzy Puchala

-- 
+--------------------------------+
|         Jerzy Puchala          |
+--------------------------------+
|       jerzypuc@scdi.com        |
+--------------------------------+

RE: Errata to my last (today) post

Posted by Yoga Balaji <y....@1internet.com>.

Jerzy. As I mentioned in my earlier mail I wasn't successful with Xerces111
and Xerces103. Actually I'm using the following code.

            DOMParserWrapper parser =
(DOMParserWrapper)Class.forName(parserName).newInstance();
            DOMCount counter = new DOMCount();
            long before = System.currentTimeMillis();
            doc = parser.parse(argv[0]);

Then I tried with yr code, as u mentioned in yr email. But I don't know how
to catch the document after this line:
parser.parse("myxml.xml");

Can u explain?? Thanks. YoGA

I just check my program with Xerces103 and Xerces111 and it works
fine. This means for me, that something is wrong in 1.1.2 version or I
am missing some important stuff.

Jerzy
--
+--------------------------------+
|         Jerzy Puchala          |
+--------------------------------+
|       jerzypuc@scdi.com        |
+--------------------------------+


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Errata to my last (today) post

Posted by Jerzy Puchala <je...@scdi.com>.

I just check my program with Xerces103 and Xerces111 and it works
fine. This means for me, that something is wrong in 1.1.2 version or I
am missing some important stuff.

Jerzy
-- 
+--------------------------------+
|         Jerzy Puchala          |
+--------------------------------+
|       jerzypuc@scdi.com        |
+--------------------------------+

Re: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Jerzy Puchala <je...@scdi.com>.

On Thu, 20 Jul 2000, Andy Clark wrote:

> 2) Modify the encoding line in all of your files to be
>    "Cp1252" (case is important this time). Then turn on the
>    following feature in the parser:
> 
>      http://apache.org/xml/features/allow-java-encodings
> 
>    Please note that your documents won't be portable in much
>    the same way that using the "Windows-1252" encoding name
>    doesn't work everywhere.
> 
I have simmilar problem: 
In XML file first line is:
<?xml version="1.0" encoding="Cp1252"?>

and in my program I have:	
DOMParser parser = new DOMParser();
try {
parser.setErrorHandler(new MyErrorHandler());

parser.setFeature("http://apache.org/xml/features/allow-java-encodings",
true);

parser.setFeature("http://xml.org/sax/features/validation", true);
parser.parse("myxml.xml");
}
....//here I am catching exceptions.

Results are not nice ;)

[Fatal Error] :0:0: The encoding "Cp1252" is not supported.
//-------------------------------

I use Xerces 112, jdk122_6.
MyErrorHandler is my error handler which was made based on
dom.wrappers.DOMParser
from samples.

Thank you for any help.

Jerzy Puchala

-- 
+--------------------------------+
|         Jerzy Puchala          |
+--------------------------------+
|       jerzypuc@scdi.com        |
+--------------------------------+

Re: XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Andy Clark <an...@apache.org>.

Yoga Balaji wrote:
> I'm trying to parse a XML file from MSNBC (daily news) and store 
> the data in the DB. I'm using Xerces parser to parse the XML file 
> I rcv. I'm getting the following error when I run my program. 
> Without "encoding=Windows-1252" it works fine. I donno how to 
> resolve this problem.

The problem is that "Windows-1252" is *not* a valid encoding name.
The XML specification states that all encoding names must be IANA
names. However, "Windows-1252" is not. Unfortunately, when the
Microsoft XML parser writes an XML document, it automatically
includes this encoding.

There are several solutions:

1) Convert all of the incoming documents from "Windows-1252"
   encoding to a proper encoding. Make sure to modify the
   encoding line at the top of the file to reflect the change.

2) Modify the encoding line in all of your files to be
   "Cp1252" (case is important this time). Then turn on the
   following feature in the parser:

     http://apache.org/xml/features/allow-java-encodings

   Please note that your documents won't be portable in much
   the same way that using the "Windows-1252" encoding name
   doesn't work everywhere.

3) Modify the MIME2Java.java source file to include a mapping
   for "Windows-1252" to "Cp1252". Recompile and rebuild the
   Jar file. Use the new Jar file and you're done.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

XERCES Windows-1252 encoding problem!!! Pls Help

Posted by Yoga Balaji <y....@1internet.com>.

I'm trying to parse a XML file from MSNBC (daily news) and store the data in
the DB. I'm using Xerces parser to parse the XML file I rcv. I'm getting the
following error when I run my program. Without "encoding=Windows-1252" it
works fine. I donno how to resolve this problem.

[Fatal Error] :0:0: The encoding "Windows-1252" is not supported.
Exception in thread  "main" java.lang.NullPointerException
            at msnbcmain.main(msnbcmain.java, Compiled Code)

Initially I used JAXP (from www.java.sun.com.xml), this problem wasn't
there - but I had entity reference problem with that. It wasn't resolving
entity references in the XML file. Xerces does that automatically but
encoding problem is coming.

Pls help me! YoGA