You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-dev@xerces.apache.org by Jon Smirl <jo...@mediaone.net> on 2000/06/15 21:53:22 UTC

SAXParser::docPI()

Why is SAXParser::docPI() ignoring PI's before the content starts? All of
the Java based SAX implementations I have used don't do this. The Java code
I'm porting needs these PI's to be reported.

from
parsers/SAXParser.cpp ------------------------------------------------------
----------------

void SAXParser::docPI(  const   XMLCh* const    target
                        , const XMLCh* const    data)
{
    // SAX don't want to know about PIs before content
    if (!fElemDepth)
        return;

    // Just map to the SAX document handler
    if (fDocHandler)
        fDocHandler->processingInstruction(target, data);

    //
    //  If there are any installed advanced handlers, then lets call them
    //  with this info.
    //
    for (unsigned int index = 0; index < fAdvDHCount; index++)
        fAdvDHList[index]->docPI(target, data);
}

Jon Smirl
jonsmirl@mediaone.net

Re: SAXParser::docPI()

Posted by Jon Smirl <jo...@mediaone.net>.

The W3C DOM model has a DocumentElement as the top node, you then do a
getDocumentElement() to get the top element node. You don't do a getChild()
because it is legal in XML to have PI's and comments occurring before the
top element. Also not that the W3C did not chose to make the top level node
and element node for this reason.

Without SAX reporting these top level PI and comment events I can't
construct a W3C compatible DOM tree. Since one of the primary purposes of
SAX is to construct DOM tree these events should be reported. I see how to
fix the code, I was just hoping that I wouldn't need to make my own private
version of Xerces.

Jon Smirl
jonsmirl@mediaone.net

RE: SAXParser::docPI()

Posted by Rich Taylor <rt...@webmd.net>.

Ahhh - I see that I haven't completely convinced you...

The XML spec says that
	[1]  document ::=  prolog element Misc

	[22]  prolog ::=  XMLDecl? Misc* (doctypedecl Misc*)?

and,	[27]  Misc ::=  Comment | PI |  S

This indicates that [Misc] can legally occur after the
root element and in the prolog (but after the XMLDecl).

Since [Misc] can be Comments or PIs or whitespace (S], then
these things are legal everywhere outside the root element
except before the XML declaration.

The SAX docs also state - in the documentation for
DocumentHandler::processingInstruction() that:

   The Parser will invoke this method once for
   each processing instruction found: note that
   processing instructions may occur before or
   after the main document element

- Rich Taylor, Healtheon/WebMD


> -----Original Message-----
> From: Dean Roddey [mailto:droddey@charmedquark.com]
> Sent: Thursday, June 15, 2000 4:04 PM
> To: xerces-c-dev@xml.apache.org
> Subject: Re: SAXParser::docPI()
>
>
> It was my understanding that PIs before the root element are not
> part of the
> SAX document handler. At least it was when I write that code. If
> someone can
> prove that these should be included (and the fact that other parsers do it
> isn't proof :-), then certainly we can remove the two lines of code that
> would be required to have them show up. Is there any 'infoset' defined for
> SAX that says what should come out of its document handler? It
> took it to be
> stuff that is part of the document content, which comments and whitespace
> and PIs before and after the root are at least argueably not (if
> you assume
> that the root element and its contents are the document itself.)
>
> Currently the SAXParser class suppresses:
>
> 1. Comments
> 2. PIs
> 3. Characters (whitespace in this case)
>
> before and after the root element. To enable them to come through, the
> SAXParser methods that map the internal events to the SAX methods would
> require this simple change:
>
>
> void SAXParser::docCharacters(  const   XMLCh* const    chars
>                                 , const unsigned int    length
>                                 , const bool            cdataSection)
> {
> -    // SAX don't want to know about chars before content
> -    if (!fElemDepth)
> -        return;
>
>     // Just map to the SAX document handler
>     if (fDocHandler)
>         fDocHandler->characters(chars, length);
>
>     //
>     //  If there are any installed advanced handlers, then lets call them
>     //  with this info.
>     //
>     for (unsigned int index = 0; index < fAdvDHCount; index++)
>         fAdvDHList[index]->docCharacters(chars, length, cdataSection);
> }
>
>
> Andy, Rahul, feel free to make this change if you can find any
> documentation
> that says they should be visible. Does the Java Xerces parser do
> that? These
> changes are certainly very safe from our standpoint. However, existing
> programs that don't expect them now because they've never been reported
> could get confused.
>
> --------------------------
> Dean Roddey
> The CIDLib C++ Frameworks
> Charmed Quark Software
> droddey@charmedquark.com
> http://www.charmedquark.com
>
> "You young, and you gotcha health. Whatchoo wanna job fer?"
>
>
> ----- Original Message -----
> From: "Jon Smirl" <jo...@mediaone.net>
> To: "xerces" <xe...@xml.apache.org>
> Sent: Thursday, June 15, 2000 12:53 PM
> Subject: SAXParser::docPI()
>
>
> > Why is SAXParser::docPI() ignoring PI's before the content
> starts? All of
> > the Java based SAX implementations I have used don't do this. The Java
> code
> > I'm porting needs these PI's to be reported.
> >
> > from
> >
> parsers/SAXParser.cpp
> ------------------------------------------------------
> > ----------------
> >
> > void SAXParser::docPI(  const   XMLCh* const    target
> >                         , const XMLCh* const    data)
> > {
> >     // SAX don't want to know about PIs before content
> >     if (!fElemDepth)
> >         return;
> >
> >     // Just map to the SAX document handler
> >     if (fDocHandler)
> >         fDocHandler->processingInstruction(target, data);
> >
> >     //
> >     //  If there are any installed advanced handlers, then lets
> call them
> >     //  with this info.
> >     //
> >     for (unsigned int index = 0; index < fAdvDHCount; index++)
> >         fAdvDHList[index]->docPI(target, data);
> > }
> >
> > Jon Smirl
> > jonsmirl@mediaone.net
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>

Re: SAXParser::docPI()

Posted by Dean Roddey <dr...@charmedquark.com>.

It was my understanding that PIs before the root element are not part of the
SAX document handler. At least it was when I write that code. If someone can
prove that these should be included (and the fact that other parsers do it
isn't proof :-), then certainly we can remove the two lines of code that
would be required to have them show up. Is there any 'infoset' defined for
SAX that says what should come out of its document handler? It took it to be
stuff that is part of the document content, which comments and whitespace
and PIs before and after the root are at least argueably not (if you assume
that the root element and its contents are the document itself.)

Currently the SAXParser class suppresses:

1. Comments
2. PIs
3. Characters (whitespace in this case)

before and after the root element. To enable them to come through, the
SAXParser methods that map the internal events to the SAX methods would
require this simple change:


void SAXParser::docCharacters(  const   XMLCh* const    chars
                                , const unsigned int    length
                                , const bool            cdataSection)
{
-    // SAX don't want to know about chars before content
-    if (!fElemDepth)
-        return;

    // Just map to the SAX document handler
    if (fDocHandler)
        fDocHandler->characters(chars, length);

    //
    //  If there are any installed advanced handlers, then lets call them
    //  with this info.
    //
    for (unsigned int index = 0; index < fAdvDHCount; index++)
        fAdvDHList[index]->docCharacters(chars, length, cdataSection);
}


Andy, Rahul, feel free to make this change if you can find any documentation
that says they should be visible. Does the Java Xerces parser do that? These
changes are certainly very safe from our standpoint. However, existing
programs that don't expect them now because they've never been reported
could get confused.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"You young, and you gotcha health. Whatchoo wanna job fer?"


----- Original Message -----
From: "Jon Smirl" <jo...@mediaone.net>
To: "xerces" <xe...@xml.apache.org>
Sent: Thursday, June 15, 2000 12:53 PM
Subject: SAXParser::docPI()


> Why is SAXParser::docPI() ignoring PI's before the content starts? All of
> the Java based SAX implementations I have used don't do this. The Java
code
> I'm porting needs these PI's to be reported.
>
> from
>
parsers/SAXParser.cpp ------------------------------------------------------
> ----------------
>
> void SAXParser::docPI(  const   XMLCh* const    target
>                         , const XMLCh* const    data)
> {
>     // SAX don't want to know about PIs before content
>     if (!fElemDepth)
>         return;
>
>     // Just map to the SAX document handler
>     if (fDocHandler)
>         fDocHandler->processingInstruction(target, data);
>
>     //
>     //  If there are any installed advanced handlers, then lets call them
>     //  with this info.
>     //
>     for (unsigned int index = 0; index < fAdvDHCount; index++)
>         fAdvDHList[index]->docPI(target, data);
> }
>
> Jon Smirl
> jonsmirl@mediaone.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>

RE: SAXParser::docPI()

Posted by Rich Taylor <rt...@webmd.net>.

Jon,

I discovered this a while ago - and convinced Mr. Roddey
that this was a bug.  I hope to see a fix in the upcoming
release.

(Perhaps someone could check to ensure that this has
 actually been fixed - I fixed it in my local copy and
 I'm not monitoring the current source.)

In the meantime, you can comment out the first if()
clause & return; in docPI() to undo this problem.

- Rich Taylor, Healtheon/WebMD


> -----Original Message-----
> From: Jon Smirl [mailto:jonsmirl@mediaone.net]
> Sent: Thursday, June 15, 2000 3:53 PM
> To: xerces
> Subject: SAXParser::docPI()
> 
> 
> Why is SAXParser::docPI() ignoring PI's before the content starts? All of
> the Java based SAX implementations I have used don't do this. The 
> Java code
> I'm porting needs these PI's to be reported.
> 
> from
> parsers/SAXParser.cpp 
> ------------------------------------------------------
> ----------------
> 
> void SAXParser::docPI(  const   XMLCh* const    target
>                         , const XMLCh* const    data)
> {
>     // SAX don't want to know about PIs before content
>     if (!fElemDepth)
>         return;
> 
>     // Just map to the SAX document handler
>     if (fDocHandler)
>         fDocHandler->processingInstruction(target, data);
> 
>     //
>     //  If there are any installed advanced handlers, then lets call them
>     //  with this info.
>     //
>     for (unsigned int index = 0; index < fAdvDHCount; index++)
>         fAdvDHList[index]->docPI(target, data);
> }
> 
> Jon Smirl
> jonsmirl@mediaone.net
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>