You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Huynh, Lynn T." <ly...@unisys.com> on 2008/08/08 00:56:46 UTC

RE: Accessing xml and doctype declaration via SAX

Michael,
I am seeing what Daniel was seeing as well.  When my xml was received
through InputSource and the xml has encoding declaration, but xerces
returns null.  It works fine when my program received it through URL.

In my program, I am calling the getXMLVersion() and getEncoding() at the
beginning of startElement(). 

Here is part of my code

	public void lookUpXmlDeclarationInfo()
	{
		if (locator instanceof Locator2) {
			Locator2 loc2 = (Locator2) locator;
			encoding = loc2.getEncoding();
			xmlVersion = loc2.getXMLVersion();
		}
	}

	public void startElement(String namespaceURI, String localName,
String qName,
			Attributes atts)
	throws SAXException
	{
		lookUpXmlDeclarationInfo();
		....
		....	
	}

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
Sent: Thursday, March 13, 2008 9:28 PM
To: j-users@xerces.apache.org
Subject: Re: Accessing xml and doctype declaration via SAX

Hi Daniel,

This is not a bug. The documentation for setDocumentLocator() [1] says:
"Note that the locator will return correct information only during the
invocation SAX event callbacks after startDocument returns and before
endDocument is called. The application should not attempt to use it at
any other time." You should never call methods on the Locator within
startDocument() or endDocument(). Try calling them later for instance in
the startElement() call for the root element:

    public void startElement(String uri, String local, String raw,
                             Attributes attrs) throws SAXException {
        // Root Element
        if (elementDepth++ == 0) {
            if (locator != null) {
                if (locator instanceof Locator2) {
                    Locator2 loc = (Locator2) locator;
                    loc.getXMLVersion();
                    loc.getEncoding();
                }
            }
            ...
        }
        ...
    }

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ContentHandl
er.html#setDocumentLocator(org.xml.sax.Locator)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Daniel Yokomizo" <da...@gmail.com> wrote on 03/13/2008
08:54:24
PM:

> On Thu, Mar 13, 2008 at 6:51 PM, Stanimir Stamenkov 
> <s7...@netscape.net> wrote:
> > Wed, 12 Mar 2008 16:25:59 -0300, /Daniel Yokomizo/:
> >
> >
> >  > The only issue I still have
> >  > is getting the xml declaration info (e.g. version, encoding) but
right
> >  > now I can just ignore it.
> >
> >  That info you should be able to obtain through the Locator2 [1]  
> > interface.  For example, in your ContentHandler implementation:
> >
> >      Locator locator;
> >
> >      public void setDocumentLocator(Locator locator) {
> >          this.locator = locator;
> >      }
> >
> >      public void startDocument() {
> >          if (locator instanceof Locator2) {
> >               Locator2 loc = (Locator2) locator;
> >               loc.getXMLVersion();
> >               loc.getEncoding();
> >          }
> >      }
> >
> >  [1]
> >  http://xerces.apache.org/xerces2-
> j/javadocs/api/org/xml/sax/ext/Locator2.html
> >
> >  --
> >  Stanimir
>
> Thank you, that solved my problem. I got into some weird behavior, 
> which I think it's a bug but I'm not certain. I created the 
> InputSource using a Reader, didn't set the encoding property of the 
> InputSource and tried to parse. Even if the document has a xml 
> declaration with explicit encoding, the locator.getEncoding() returned

> null. Creating the InputSource with a InputStream worked, because the 
> parser tried to discover the encoding based on the first bytes of the 
> stream. I think this is a bug because the document has the encoding 
> information and there are no other places with this information 
> (either explicit, like in the InputSource, or implicit like in the 
> InputStream case) that could possibly conflict, so the locator should 
> have this info. Should I open a bug report (assuming that this isn't a

> known bug, I seached the JIRA but I couldn't find a thing)? Either way

> I changed my uses to InputStream and everything worked ok.
>
> Best regards,
> Daniel Yokomizo.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org