You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Ittay Dror <it...@qlusters.com> on 2006/04/09 15:27:15 UTC
getElementById doesn't work with SAX2DOM (and tagsoup)
i've turned on the schema and validation features, and set a schema to my html, but getElementById still doesn't work (i'm using the latest of xerces, xalan and tagsoup, xhtml1-transitional.dtd and entities are saved locally.
this is my html:
<html>
<body>
<div id="foo">hello</div>
</body>
</html>
this is my code:
import org.apache.xalan.xsltc.trax.SAX2DOM;
import org.apache.xerces.jaxp.DocumentBuilderFactoryImpl;
import org.ccil.cowan.tagsoup.Parser;
import org.w3c.dom.DOMConfiguration;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Test {
public static class HTMLDocumentBuilderFactory extends DocumentBuilderFactoryImpl {
public HTMLDocumentBuilderFactory() throws SAXException, ParserConfigurationException {
setValidating(true);
setFeature("http://apache.org/xml/features/validation/schema", true);
setFeature("http://xml.org/sax/features/validation", true);
}
}
public final static void main(String[] args) throws Exception {
Parser p = new Parser();
System.setProperty("javax.xml.parsers.DocumentBuilderFactory", HTMLDocumentBuilderFactory.class.getName());
SAX2DOM sax2dom = new SAX2DOM();
Document doc = (Document)sax2dom.getDOM();
DOMConfiguration config = doc.getDomConfig();
config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
config.setParameter("schema-location", "/tmp/xhtml1-transitional.dtd");
p.setContentHandler(sax2dom);
InputSource docsrc = new InputSource("/tmp/test.html");
p.parse(docsrc);
System.out.println(doc.getElementById("foo"));
}
}
thanx,
ittay
--
===================================
Ittay Dror
openQRM Team Leader,
R&D, Qlusters Inc.
ittayd@qlusters.com
+972-3-6081994 Fax: +972-3-6081841
http://www.openQRM.org
- Keeps your Data-Center Up and Running
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: getElementById doesn't work with SAX2DOM (and tagsoup)
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Ittay Dror <it...@qlusters.com> wrote on 04/16/2006 10:24:49 AM:
>
> Michael Glavassevich wrote:
> > Hello Ittay,
> >
> > A Document's DOMConfiguration [1] is used when
> > Document.normalizeDocument() is invoked. I wouldn't assume that Xalan
> > calls that method so you likely need to call it yourself. Probably
worth
> > noting that in-memory DTD validation using normalizeDocument() was
> > completely broken prior to Xerces 2.8.0. I spent a couple weeks last
year
> > fixing many of the major bugs but didn't get around to all of them
before
> > the release (though I hope to stamp the rest out before Xerces 2.9).
I've
> > never checked whether getElementById() works after calling
> > normalizeDocument() with DTD validation enabled but glancing over the
> > current code I suspect it doesn't.
>
> ouch.
>
> is there a way to make it work?
Aside from waiting for the bugs to be fixed, if you know which attributes
should be treated as IDs you could traverse the DOM and mark them as IDs
by calling setIDAttributeNode():
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-ElSetIdAttrNode
> >
> > [1]
> > http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.
> html#Document3-domConfig
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > Ittay Dror <it...@qlusters.com> wrote on 04/09/2006 09:27:15 AM:
> >
> >> i've turned on the schema and validation features, and set a schema
> >> to my html, but getElementById still doesn't work (i'm using the
> >> latest of xerces, xalan and tagsoup, xhtml1-transitional.dtd and
> >> entities are saved locally.
> >>
> >> this is my html:
> >> <html>
> >> <body>
> >> <div id="foo">hello</div>
> >> </body>
> >> </html>
> >>
> >> this is my code:
> >> import org.apache.xalan.xsltc.trax.SAX2DOM;
> >> import org.apache.xerces.jaxp.DocumentBuilderFactoryImpl;
> >> import org.ccil.cowan.tagsoup.Parser;
> >> import org.w3c.dom.DOMConfiguration;
> >> import org.w3c.dom.Document;
> >> import org.xml.sax.InputSource;
> >> import org.xml.sax.SAXException;
> >>
> >> public class Test {
> >> public static class HTMLDocumentBuilderFactory extends
> >> DocumentBuilderFactoryImpl {
> >> public HTMLDocumentBuilderFactory() throws SAXException,
> >> ParserConfigurationException {
> >> setValidating(true);
> >> setFeature("http://apache.org/xml/features/validation/schema
> >> ", true);
> >> setFeature("http://xml.org/sax/features/validation",
true);
> >> }
> >> }
> >>
> >> public final static void main(String[] args) throws Exception {
> >>
> >> Parser p = new Parser();
> >> System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
> >> HTMLDocumentBuilderFactory.class.getName());
> >> SAX2DOM sax2dom = new SAX2DOM();
> >> Document doc = (Document)sax2dom.getDOM();
> >> DOMConfiguration config = doc.getDomConfig();
> >> config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
> >> config.setParameter("schema-location",
> > "/tmp/xhtml1-transitional.dtd");
> >> p.setContentHandler(sax2dom);
> >>
> >> InputSource docsrc = new InputSource("/tmp/test.html");
> >> p.parse(docsrc);
> >>
> >> System.out.println(doc.getElementById("foo"));
> >> }
> >> }
> >>
> >> thanx,
> >> ittay
> >>
> >> --
> >> ===================================
> >> Ittay Dror
> >> openQRM Team Leader,
> >> R&D, Qlusters Inc.
> >> ittayd@qlusters.com
> >> +972-3-6081994 Fax: +972-3-6081841
> >>
> >> http://www.openQRM.org
> >> - Keeps your Data-Center Up and Running
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> >> For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
>
>
> --
> ===================================
> Ittay Dror
> openQRM Team Leader,
> R&D, Qlusters Inc.
> ittayd@qlusters.com
> +972-3-6081994 Fax: +972-3-6081841
>
> http://www.openQRM.org
> - Keeps your Data-Center Up and Running
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: getElementById doesn't work with SAX2DOM (and tagsoup)
Posted by Ittay Dror <it...@qlusters.com>.
Michael Glavassevich wrote:
> Hello Ittay,
>
> A Document's DOMConfiguration [1] is used when
> Document.normalizeDocument() is invoked. I wouldn't assume that Xalan
> calls that method so you likely need to call it yourself. Probably worth
> noting that in-memory DTD validation using normalizeDocument() was
> completely broken prior to Xerces 2.8.0. I spent a couple weeks last year
> fixing many of the major bugs but didn't get around to all of them before
> the release (though I hope to stamp the rest out before Xerces 2.9). I've
> never checked whether getElementById() works after calling
> normalizeDocument() with DTD validation enabled but glancing over the
> current code I suspect it doesn't.
ouch.
is there a way to make it work?
>
> [1]
> http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-domConfig
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Ittay Dror <it...@qlusters.com> wrote on 04/09/2006 09:27:15 AM:
>
>> i've turned on the schema and validation features, and set a schema
>> to my html, but getElementById still doesn't work (i'm using the
>> latest of xerces, xalan and tagsoup, xhtml1-transitional.dtd and
>> entities are saved locally.
>>
>> this is my html:
>> <html>
>> <body>
>> <div id="foo">hello</div>
>> </body>
>> </html>
>>
>> this is my code:
>> import org.apache.xalan.xsltc.trax.SAX2DOM;
>> import org.apache.xerces.jaxp.DocumentBuilderFactoryImpl;
>> import org.ccil.cowan.tagsoup.Parser;
>> import org.w3c.dom.DOMConfiguration;
>> import org.w3c.dom.Document;
>> import org.xml.sax.InputSource;
>> import org.xml.sax.SAXException;
>>
>> public class Test {
>> public static class HTMLDocumentBuilderFactory extends
>> DocumentBuilderFactoryImpl {
>> public HTMLDocumentBuilderFactory() throws SAXException,
>> ParserConfigurationException {
>> setValidating(true);
>> setFeature("http://apache.org/xml/features/validation/schema
>> ", true);
>> setFeature("http://xml.org/sax/features/validation", true);
>> }
>> }
>>
>> public final static void main(String[] args) throws Exception {
>>
>> Parser p = new Parser();
>> System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
>> HTMLDocumentBuilderFactory.class.getName());
>> SAX2DOM sax2dom = new SAX2DOM();
>> Document doc = (Document)sax2dom.getDOM();
>> DOMConfiguration config = doc.getDomConfig();
>> config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
>> config.setParameter("schema-location",
> "/tmp/xhtml1-transitional.dtd");
>> p.setContentHandler(sax2dom);
>>
>> InputSource docsrc = new InputSource("/tmp/test.html");
>> p.parse(docsrc);
>>
>> System.out.println(doc.getElementById("foo"));
>> }
>> }
>>
>> thanx,
>> ittay
>>
>> --
>> ===================================
>> Ittay Dror
>> openQRM Team Leader,
>> R&D, Qlusters Inc.
>> ittayd@qlusters.com
>> +972-3-6081994 Fax: +972-3-6081841
>>
>> http://www.openQRM.org
>> - Keeps your Data-Center Up and Running
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
--
===================================
Ittay Dror
openQRM Team Leader,
R&D, Qlusters Inc.
ittayd@qlusters.com
+972-3-6081994 Fax: +972-3-6081841
http://www.openQRM.org
- Keeps your Data-Center Up and Running
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: getElementById doesn't work with SAX2DOM (and tagsoup)
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hello Ittay,
A Document's DOMConfiguration [1] is used when
Document.normalizeDocument() is invoked. I wouldn't assume that Xalan
calls that method so you likely need to call it yourself. Probably worth
noting that in-memory DTD validation using normalizeDocument() was
completely broken prior to Xerces 2.8.0. I spent a couple weeks last year
fixing many of the major bugs but didn't get around to all of them before
the release (though I hope to stamp the rest out before Xerces 2.9). I've
never checked whether getElementById() works after calling
normalizeDocument() with DTD validation enabled but glancing over the
current code I suspect it doesn't.
[1]
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-domConfig
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
Ittay Dror <it...@qlusters.com> wrote on 04/09/2006 09:27:15 AM:
> i've turned on the schema and validation features, and set a schema
> to my html, but getElementById still doesn't work (i'm using the
> latest of xerces, xalan and tagsoup, xhtml1-transitional.dtd and
> entities are saved locally.
>
> this is my html:
> <html>
> <body>
> <div id="foo">hello</div>
> </body>
> </html>
>
> this is my code:
> import org.apache.xalan.xsltc.trax.SAX2DOM;
> import org.apache.xerces.jaxp.DocumentBuilderFactoryImpl;
> import org.ccil.cowan.tagsoup.Parser;
> import org.w3c.dom.DOMConfiguration;
> import org.w3c.dom.Document;
> import org.xml.sax.InputSource;
> import org.xml.sax.SAXException;
>
> public class Test {
> public static class HTMLDocumentBuilderFactory extends
> DocumentBuilderFactoryImpl {
> public HTMLDocumentBuilderFactory() throws SAXException,
> ParserConfigurationException {
> setValidating(true);
> setFeature("http://apache.org/xml/features/validation/schema
> ", true);
> setFeature("http://xml.org/sax/features/validation", true);
> }
> }
>
> public final static void main(String[] args) throws Exception {
>
> Parser p = new Parser();
> System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
> HTMLDocumentBuilderFactory.class.getName());
> SAX2DOM sax2dom = new SAX2DOM();
> Document doc = (Document)sax2dom.getDOM();
> DOMConfiguration config = doc.getDomConfig();
> config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
> config.setParameter("schema-location",
"/tmp/xhtml1-transitional.dtd");
> p.setContentHandler(sax2dom);
>
> InputSource docsrc = new InputSource("/tmp/test.html");
> p.parse(docsrc);
>
> System.out.println(doc.getElementById("foo"));
> }
> }
>
> thanx,
> ittay
>
> --
> ===================================
> Ittay Dror
> openQRM Team Leader,
> R&D, Qlusters Inc.
> ittayd@qlusters.com
> +972-3-6081994 Fax: +972-3-6081841
>
> http://www.openQRM.org
> - Keeps your Data-Center Up and Running
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org