You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Ittay Dror <it...@qlusters.com> on 2006/04/09 15:27:15 UTC

getElementById doesn't work with SAX2DOM (and tagsoup)

i've turned on the schema and validation features, and set a schema to my html, but getElementById still doesn't work (i'm using the latest of xerces, xalan and tagsoup, xhtml1-transitional.dtd and entities are saved locally.

this is my html:
<html>
<body>
        <div id="foo">hello</div>
</body>
</html>

this is my code:
import org.apache.xalan.xsltc.trax.SAX2DOM;
import org.apache.xerces.jaxp.DocumentBuilderFactoryImpl;
import org.ccil.cowan.tagsoup.Parser;
import org.w3c.dom.DOMConfiguration;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
 
public class Test {
	public static class HTMLDocumentBuilderFactory extends DocumentBuilderFactoryImpl {
		public HTMLDocumentBuilderFactory() throws SAXException, ParserConfigurationException {
            setValidating(true);
            setFeature("http://apache.org/xml/features/validation/schema", true);
            setFeature("http://xml.org/sax/features/validation", true);
		}
	}

 public final static void main(String[] args) throws Exception {
  
  Parser p = new Parser();
  System.setProperty("javax.xml.parsers.DocumentBuilderFactory", HTMLDocumentBuilderFactory.class.getName());
  SAX2DOM sax2dom = new SAX2DOM();
  Document doc = (Document)sax2dom.getDOM();
  DOMConfiguration config = doc.getDomConfig();
  config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
  config.setParameter("schema-location", "/tmp/xhtml1-transitional.dtd");
  p.setContentHandler(sax2dom);
  
  InputSource docsrc = new InputSource("/tmp/test.html");
  p.parse(docsrc);
  
  System.out.println(doc.getElementById("foo"));
 }
}

thanx, 
ittay

-- 
===================================
Ittay Dror 
openQRM Team Leader, 
R&D, Qlusters Inc.
ittayd@qlusters.com
+972-3-6081994 Fax: +972-3-6081841

http://www.openQRM.org
- Keeps your Data-Center Up and Running

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: getElementById doesn't work with SAX2DOM (and tagsoup)

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Ittay Dror <it...@qlusters.com> wrote on 04/16/2006 10:24:49 AM:

> 
> Michael Glavassevich wrote:
> > Hello Ittay,
> > 
> > A Document's DOMConfiguration [1] is used when 
> > Document.normalizeDocument() is invoked. I wouldn't assume that Xalan 
> > calls that method so you likely need to call it yourself. Probably 
worth 
> > noting that in-memory DTD validation using normalizeDocument() was 
> > completely broken prior to Xerces 2.8.0. I spent a couple weeks last 
year 
> > fixing many of the major bugs but didn't get around to all of them 
before 
> > the release (though I hope to stamp the rest out before Xerces 2.9). 
I've 
> > never checked whether getElementById() works after calling 
> > normalizeDocument() with DTD validation enabled but glancing over the 
> > current code I suspect it doesn't.
> 
> ouch.
> 
> is there a way to make it work?

Aside from waiting for the bugs to be fixed, if you know which attributes 
should be treated as IDs you could traverse the DOM and mark them as IDs 
by calling setIDAttributeNode(): 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-ElSetIdAttrNode

> > 
> > [1] 
> > http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.
> html#Document3-domConfig
> > 
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> > 
> > Ittay Dror <it...@qlusters.com> wrote on 04/09/2006 09:27:15 AM:
> > 
> >> i've turned on the schema and validation features, and set a schema 
> >> to my html, but getElementById still doesn't work (i'm using the 
> >> latest of xerces, xalan and tagsoup, xhtml1-transitional.dtd and 
> >> entities are saved locally.
> >>
> >> this is my html:
> >> <html>
> >> <body>
> >>         <div id="foo">hello</div>
> >> </body>
> >> </html>
> >>
> >> this is my code:
> >> import org.apache.xalan.xsltc.trax.SAX2DOM;
> >> import org.apache.xerces.jaxp.DocumentBuilderFactoryImpl;
> >> import org.ccil.cowan.tagsoup.Parser;
> >> import org.w3c.dom.DOMConfiguration;
> >> import org.w3c.dom.Document;
> >> import org.xml.sax.InputSource;
> >> import org.xml.sax.SAXException;
> >>
> >> public class Test {
> >>    public static class HTMLDocumentBuilderFactory extends 
> >> DocumentBuilderFactoryImpl {
> >>       public HTMLDocumentBuilderFactory() throws SAXException, 
> >> ParserConfigurationException {
> >>             setValidating(true);
> >> setFeature("http://apache.org/xml/features/validation/schema
> >> ", true);
> >>             setFeature("http://xml.org/sax/features/validation", 
true);
> >>       }
> >>    }
> >>
> >>  public final static void main(String[] args) throws Exception {
> >>
> >>   Parser p = new Parser();
> >>   System.setProperty("javax.xml.parsers.DocumentBuilderFactory", 
> >> HTMLDocumentBuilderFactory.class.getName());
> >>   SAX2DOM sax2dom = new SAX2DOM();
> >>   Document doc = (Document)sax2dom.getDOM();
> >>   DOMConfiguration config = doc.getDomConfig();
> >>   config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
> >>   config.setParameter("schema-location", 
> > "/tmp/xhtml1-transitional.dtd");
> >>   p.setContentHandler(sax2dom);
> >>
> >>   InputSource docsrc = new InputSource("/tmp/test.html");
> >>   p.parse(docsrc);
> >>
> >>   System.out.println(doc.getElementById("foo"));
> >>  }
> >> }
> >>
> >> thanx, 
> >> ittay
> >>
> >> -- 
> >> ===================================
> >> Ittay Dror 
> >> openQRM Team Leader, 
> >> R&D, Qlusters Inc.
> >> ittayd@qlusters.com
> >> +972-3-6081994 Fax: +972-3-6081841
> >>
> >> http://www.openQRM.org
> >> - Keeps your Data-Center Up and Running
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> >> For additional commands, e-mail: j-users-help@xerces.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
> > 
> > 
> 
> 
> -- 
> ===================================
> Ittay Dror 
> openQRM Team Leader, 
> R&D, Qlusters Inc.
> ittayd@qlusters.com
> +972-3-6081994 Fax: +972-3-6081841
> 
> http://www.openQRM.org
> - Keeps your Data-Center Up and Running
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: getElementById doesn't work with SAX2DOM (and tagsoup)

Posted by Ittay Dror <it...@qlusters.com>.

Michael Glavassevich wrote:
> Hello Ittay,
> 
> A Document's DOMConfiguration [1] is used when 
> Document.normalizeDocument() is invoked. I wouldn't assume that Xalan 
> calls that method so you likely need to call it yourself. Probably worth 
> noting that in-memory DTD validation using normalizeDocument() was 
> completely broken prior to Xerces 2.8.0. I spent a couple weeks last year 
> fixing many of the major bugs but didn't get around to all of them before 
> the release (though I hope to stamp the rest out before Xerces 2.9). I've 
> never checked whether getElementById() works after calling 
> normalizeDocument() with DTD validation enabled but glancing over the 
> current code I suspect it doesn't.

ouch.

is there a way to make it work?

> 
> [1] 
> http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-domConfig
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> Ittay Dror <it...@qlusters.com> wrote on 04/09/2006 09:27:15 AM:
> 
>> i've turned on the schema and validation features, and set a schema 
>> to my html, but getElementById still doesn't work (i'm using the 
>> latest of xerces, xalan and tagsoup, xhtml1-transitional.dtd and 
>> entities are saved locally.
>>
>> this is my html:
>> <html>
>> <body>
>>         <div id="foo">hello</div>
>> </body>
>> </html>
>>
>> this is my code:
>> import org.apache.xalan.xsltc.trax.SAX2DOM;
>> import org.apache.xerces.jaxp.DocumentBuilderFactoryImpl;
>> import org.ccil.cowan.tagsoup.Parser;
>> import org.w3c.dom.DOMConfiguration;
>> import org.w3c.dom.Document;
>> import org.xml.sax.InputSource;
>> import org.xml.sax.SAXException;
>>
>> public class Test {
>>    public static class HTMLDocumentBuilderFactory extends 
>> DocumentBuilderFactoryImpl {
>>       public HTMLDocumentBuilderFactory() throws SAXException, 
>> ParserConfigurationException {
>>             setValidating(true);
>>             setFeature("http://apache.org/xml/features/validation/schema
>> ", true);
>>             setFeature("http://xml.org/sax/features/validation", true);
>>       }
>>    }
>>
>>  public final static void main(String[] args) throws Exception {
>>
>>   Parser p = new Parser();
>>   System.setProperty("javax.xml.parsers.DocumentBuilderFactory", 
>> HTMLDocumentBuilderFactory.class.getName());
>>   SAX2DOM sax2dom = new SAX2DOM();
>>   Document doc = (Document)sax2dom.getDOM();
>>   DOMConfiguration config = doc.getDomConfig();
>>   config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
>>   config.setParameter("schema-location", 
> "/tmp/xhtml1-transitional.dtd");
>>   p.setContentHandler(sax2dom);
>>
>>   InputSource docsrc = new InputSource("/tmp/test.html");
>>   p.parse(docsrc);
>>
>>   System.out.println(doc.getElementById("foo"));
>>  }
>> }
>>
>> thanx, 
>> ittay
>>
>> -- 
>> ===================================
>> Ittay Dror 
>> openQRM Team Leader, 
>> R&D, Qlusters Inc.
>> ittayd@qlusters.com
>> +972-3-6081994 Fax: +972-3-6081841
>>
>> http://www.openQRM.org
>> - Keeps your Data-Center Up and Running
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 


-- 
===================================
Ittay Dror 
openQRM Team Leader, 
R&D, Qlusters Inc.
ittayd@qlusters.com
+972-3-6081994 Fax: +972-3-6081841

http://www.openQRM.org
- Keeps your Data-Center Up and Running

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: getElementById doesn't work with SAX2DOM (and tagsoup)

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hello Ittay,

A Document's DOMConfiguration [1] is used when 
Document.normalizeDocument() is invoked. I wouldn't assume that Xalan 
calls that method so you likely need to call it yourself. Probably worth 
noting that in-memory DTD validation using normalizeDocument() was 
completely broken prior to Xerces 2.8.0. I spent a couple weeks last year 
fixing many of the major bugs but didn't get around to all of them before 
the release (though I hope to stamp the rest out before Xerces 2.9). I've 
never checked whether getElementById() works after calling 
normalizeDocument() with DTD validation enabled but glancing over the 
current code I suspect it doesn't.

[1] 
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-domConfig

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Ittay Dror <it...@qlusters.com> wrote on 04/09/2006 09:27:15 AM:

> i've turned on the schema and validation features, and set a schema 
> to my html, but getElementById still doesn't work (i'm using the 
> latest of xerces, xalan and tagsoup, xhtml1-transitional.dtd and 
> entities are saved locally.
> 
> this is my html:
> <html>
> <body>
>         <div id="foo">hello</div>
> </body>
> </html>
> 
> this is my code:
> import org.apache.xalan.xsltc.trax.SAX2DOM;
> import org.apache.xerces.jaxp.DocumentBuilderFactoryImpl;
> import org.ccil.cowan.tagsoup.Parser;
> import org.w3c.dom.DOMConfiguration;
> import org.w3c.dom.Document;
> import org.xml.sax.InputSource;
> import org.xml.sax.SAXException;
> 
> public class Test {
>    public static class HTMLDocumentBuilderFactory extends 
> DocumentBuilderFactoryImpl {
>       public HTMLDocumentBuilderFactory() throws SAXException, 
> ParserConfigurationException {
>             setValidating(true);
>             setFeature("http://apache.org/xml/features/validation/schema
> ", true);
>             setFeature("http://xml.org/sax/features/validation", true);
>       }
>    }
> 
>  public final static void main(String[] args) throws Exception {
> 
>   Parser p = new Parser();
>   System.setProperty("javax.xml.parsers.DocumentBuilderFactory", 
> HTMLDocumentBuilderFactory.class.getName());
>   SAX2DOM sax2dom = new SAX2DOM();
>   Document doc = (Document)sax2dom.getDOM();
>   DOMConfiguration config = doc.getDomConfig();
>   config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
>   config.setParameter("schema-location", 
"/tmp/xhtml1-transitional.dtd");
>   p.setContentHandler(sax2dom);
> 
>   InputSource docsrc = new InputSource("/tmp/test.html");
>   p.parse(docsrc);
> 
>   System.out.println(doc.getElementById("foo"));
>  }
> }
> 
> thanx, 
> ittay
> 
> -- 
> ===================================
> Ittay Dror 
> openQRM Team Leader, 
> R&D, Qlusters Inc.
> ittayd@qlusters.com
> +972-3-6081994 Fax: +972-3-6081841
> 
> http://www.openQRM.org
> - Keeps your Data-Center Up and Running
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org