You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Ittay Dror <it...@qlusters.com> on 2006/04/08 15:54:52 UTC
trying to use id() function in xpath
hi,
i'm new to xalan and xsl. i'm trying to get elements using the id function from an html document. i don't work with an xsl document, just trying to get elements.
this is my code (basically, slightly modified ApplyXPathJaxp sample):
InputSource xml = new InputSource("/tmp/test.html");
xml.setEncoding("US-ASCII");
String expr = "id('foo')";
// Create a new XPath
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
Object result = null;
try {
// compile the XPath expression
XPathExpression xpathExpr = xpath.compile(expr);
// Evaluate the XPath expression against the input document
Node node = (Node) xpathExpr.evaluate(xml, XPathConstants.NODE);
System.out.println(node);
}
catch (Exception e) {
e.printStackTrace();
}
and my html is:
<html>
<body>
<label id="foo">hello</label>
</body>
</html>
i've tried the following:
1. add a doctype <!DOCTYPE html SYSTEM "/tmp/xhtml1-transitional.dtd"> (the dtd and ent files are saved locally). this is not a good option for me, since i want to parse arbitrary html documents (i'll do that with tagsoup and SAX2DOM, but first i want to get this to work)
2. creating a DOM, setting in the config the schema and scema-type, and using XPathAPI, or doc.getElementById
3. setting the system id in the InputSource
from debugging, it seems that the parser doesn't recognize 'id' as being an id.
i would really like the id() function to work (the xpath expressions are used by users, and id() is more natural, than defining keys, though i don't know how to do that either)
thanks for your help,
ittay
--
===================================
Ittay Dror
openQRM Team Leader,
R&D, Qlusters Inc.
ittayd@qlusters.com
+972-3-6081994 Fax: +972-3-6081841
http://www.openQRM.org
- Keeps your Data-Center Up and Running
Re: trying to use id() function in xpath
Posted by Joanne Tong <jo...@ca.ibm.com>.
Hi
I tried the following and it works:
try {
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse("test.xml");
XPath xpath =
XPathFactory.newInstance().newXPath();
String expression = "id('foo')";
Node node = (Node) xpath.evaluate(expression,
doc,XPathConstants.NODE);
String tmp1 = node.getFirstChild().getNodeValue();
System.out.println("tmp1: "+tmp1);
} catch(Exception e) {
e.printStackTrace();
}
input is
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<body>
<label id="foo">hello</label>
</body>
</html>
thanks,
Joanne Tong
Software Developer, XSLT Development
Ittay Dror <it...@qlusters.com>
04/09/2006 05:39 AM
To
Ittay Dror <it...@qlusters.com>
cc
xalan-j-users@xml.apache.org
Subject
Re: trying to use id() function in xpath
also tried:
public static class HTMLDocumentBuilderFactory extends
DocumentBuilderFactoryImpl {
public HTMLDocumentBuilderFactory()
throws SAXException {
SchemaFactory sfac =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
String schemaString = ""
+ "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\
">"
+ " <xs:attribute name=\"id\" type=\"xs:ID\"/>"
+ "</xs:schema>";
Schema schema = sfac.newSchema(new StreamSource(
new StringReader(schemaString)));
setSchema(schema);
setValidating(true);
}
}
and in main:
System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
HTMLDocumentBuilderFactory.class.getName());
Ittay Dror wrote:
> i've also tried this:
> Parser p = new Parser(); // from tagsoup
>
> SAX2DOM sax2dom = new SAX2DOM();
> Document doc = (Document)sax2dom.getDOM();
> DOMConfiguration config = doc.getDomConfig();
> config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
> config.setParameter("schema-location", "/tmp/xhtml1-transitional.dtd");
> // config.setParameter("datatype-normalization", Boolean.FALSE);
> //config.setParameter("psvi", Boolean.TRUE);
> config.setParameter("validate",Boolean.TRUE);
> doc.insertBefore(doc.getImplementation().createDocumentType("html",
> null, "/tmp/xhtml1-transitional.dtd"), null);
> p.setContentHandler(sax2dom);
>
>
> InputSource docsrc = new InputSource("/tmp/test.html");
> docsrc.setSystemId("/tmp/xhtml1-transitional.dtd");
> p.parse(docsrc);
>
> System.out.println(doc.getElementById("foo"));
>
> i get null in the console
>
> thanx,
> ittay
>
> Ittay Dror wrote:
>> hi,
>>
>> i'm new to xalan and xsl. i'm trying to get elements using the id
>> function from an html document. i don't work with an xsl document,
>> just trying to get elements.
>>
>> this is my code (basically, slightly modified ApplyXPathJaxp sample):
>> InputSource xml = new InputSource("/tmp/test.html");
>> xml.setEncoding("US-ASCII");
>> String expr = "id('foo')";
>> // Create a new XPath
>> XPathFactory factory = XPathFactory.newInstance();
>> XPath xpath = factory.newXPath();
>> Object result = null;
>> try {
>> // compile the XPath expression
>> XPathExpression xpathExpr = xpath.compile(expr);
>> // Evaluate the XPath expression against the input
>> document
>> Node node = (Node) xpathExpr.evaluate(xml,
XPathConstants.NODE);
>> System.out.println(node);
>> }
>> catch (Exception e) {
>> e.printStackTrace();
>> } and my html is:
>>
>> <html>
>> <body>
>> <label id="foo">hello</label>
>> </body>
>> </html>
>>
>> i've tried the following:
>> 1. add a doctype <!DOCTYPE html SYSTEM "/tmp/xhtml1-transitional.dtd">
>> (the dtd and ent files are saved locally). this is not a good option
>> for me, since i want to parse arbitrary html documents (i'll do that
>> with tagsoup and SAX2DOM, but first i want to get this to work)
>> 2. creating a DOM, setting in the config the schema and scema-type,
>> and using XPathAPI, or doc.getElementById
>> 3. setting the system id in the InputSource
>>
>> from debugging, it seems that the parser doesn't recognize 'id' as
>> being an id.
>>
>> i would really like the id() function to work (the xpath expressions
>> are used by users, and id() is more natural, than defining keys,
>> though i don't know how to do that either)
>>
>> thanks for your help,
>> ittay
>>
>
>
--
===================================
Ittay Dror
openQRM Team Leader,
R&D, Qlusters Inc.
ittayd@qlusters.com
+972-3-6081994 Fax: +972-3-6081841
http://www.openQRM.org
- Keeps your Data-Center Up and Running
Re: trying to use id() function in xpath
Posted by Ittay Dror <it...@qlusters.com>.
also tried:
public static class HTMLDocumentBuilderFactory extends DocumentBuilderFactoryImpl {
public HTMLDocumentBuilderFactory() throws SAXException {
SchemaFactory sfac =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
String schemaString = ""
+ "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\">"
+ " <xs:attribute name=\"id\" type=\"xs:ID\"/>"
+ "</xs:schema>";
Schema schema = sfac.newSchema(new StreamSource(
new StringReader(schemaString)));
setSchema(schema);
setValidating(true);
}
}
and in main:
System.setProperty("javax.xml.parsers.DocumentBuilderFactory", HTMLDocumentBuilderFactory.class.getName());
Ittay Dror wrote:
> i've also tried this:
> Parser p = new Parser(); // from tagsoup
>
> SAX2DOM sax2dom = new SAX2DOM();
> Document doc = (Document)sax2dom.getDOM();
> DOMConfiguration config = doc.getDomConfig();
> config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
> config.setParameter("schema-location", "/tmp/xhtml1-transitional.dtd");
> // config.setParameter("datatype-normalization", Boolean.FALSE);
> //config.setParameter("psvi", Boolean.TRUE);
> config.setParameter("validate",Boolean.TRUE);
> doc.insertBefore(doc.getImplementation().createDocumentType("html",
> null, "/tmp/xhtml1-transitional.dtd"), null);
> p.setContentHandler(sax2dom);
>
>
> InputSource docsrc = new InputSource("/tmp/test.html");
> docsrc.setSystemId("/tmp/xhtml1-transitional.dtd");
> p.parse(docsrc);
>
> System.out.println(doc.getElementById("foo"));
>
> i get null in the console
>
> thanx,
> ittay
>
> Ittay Dror wrote:
>> hi,
>>
>> i'm new to xalan and xsl. i'm trying to get elements using the id
>> function from an html document. i don't work with an xsl document,
>> just trying to get elements.
>>
>> this is my code (basically, slightly modified ApplyXPathJaxp sample):
>> InputSource xml = new InputSource("/tmp/test.html");
>> xml.setEncoding("US-ASCII");
>> String expr = "id('foo')";
>> // Create a new XPath
>> XPathFactory factory = XPathFactory.newInstance();
>> XPath xpath = factory.newXPath();
>> Object result = null;
>> try {
>> // compile the XPath expression
>> XPathExpression xpathExpr = xpath.compile(expr);
>> // Evaluate the XPath expression against the input
>> document
>> Node node = (Node) xpathExpr.evaluate(xml, XPathConstants.NODE);
>> System.out.println(node);
>> }
>> catch (Exception e) {
>> e.printStackTrace();
>> } and my html is:
>>
>> <html>
>> <body>
>> <label id="foo">hello</label>
>> </body>
>> </html>
>>
>> i've tried the following:
>> 1. add a doctype <!DOCTYPE html SYSTEM "/tmp/xhtml1-transitional.dtd">
>> (the dtd and ent files are saved locally). this is not a good option
>> for me, since i want to parse arbitrary html documents (i'll do that
>> with tagsoup and SAX2DOM, but first i want to get this to work)
>> 2. creating a DOM, setting in the config the schema and scema-type,
>> and using XPathAPI, or doc.getElementById
>> 3. setting the system id in the InputSource
>>
>> from debugging, it seems that the parser doesn't recognize 'id' as
>> being an id.
>>
>> i would really like the id() function to work (the xpath expressions
>> are used by users, and id() is more natural, than defining keys,
>> though i don't know how to do that either)
>>
>> thanks for your help,
>> ittay
>>
>
>
--
===================================
Ittay Dror
openQRM Team Leader,
R&D, Qlusters Inc.
ittayd@qlusters.com
+972-3-6081994 Fax: +972-3-6081841
http://www.openQRM.org
- Keeps your Data-Center Up and Running
Re: trying to use id() function in xpath
Posted by Ittay Dror <it...@qlusters.com>.
i've also tried this:
Parser p = new Parser(); // from tagsoup
SAX2DOM sax2dom = new SAX2DOM();
Document doc = (Document)sax2dom.getDOM();
DOMConfiguration config = doc.getDomConfig();
config.setParameter("schema-type","http://www.w3.org/TR/REC-xml");
config.setParameter("schema-location", "/tmp/xhtml1-transitional.dtd");
// config.setParameter("datatype-normalization", Boolean.FALSE);
//config.setParameter("psvi", Boolean.TRUE);
config.setParameter("validate",Boolean.TRUE);
doc.insertBefore(doc.getImplementation().createDocumentType("html", null, "/tmp/xhtml1-transitional.dtd"), null);
p.setContentHandler(sax2dom);
InputSource docsrc = new InputSource("/tmp/test.html");
docsrc.setSystemId("/tmp/xhtml1-transitional.dtd");
p.parse(docsrc);
System.out.println(doc.getElementById("foo"));
i get null in the console
thanx,
ittay
Ittay Dror wrote:
> hi,
>
> i'm new to xalan and xsl. i'm trying to get elements using the id
> function from an html document. i don't work with an xsl document, just
> trying to get elements.
>
> this is my code (basically, slightly modified ApplyXPathJaxp sample):
> InputSource xml = new InputSource("/tmp/test.html");
> xml.setEncoding("US-ASCII");
> String expr = "id('foo')";
> // Create a new XPath
> XPathFactory factory = XPathFactory.newInstance();
> XPath xpath = factory.newXPath();
> Object result = null;
> try {
> // compile the XPath expression
> XPathExpression xpathExpr = xpath.compile(expr);
> // Evaluate the XPath expression against the input document
> Node node = (Node) xpathExpr.evaluate(xml, XPathConstants.NODE);
> System.out.println(node);
> }
> catch (Exception e) {
> e.printStackTrace();
> }
> and my html is:
>
> <html>
> <body>
> <label id="foo">hello</label>
> </body>
> </html>
>
> i've tried the following:
> 1. add a doctype <!DOCTYPE html SYSTEM "/tmp/xhtml1-transitional.dtd">
> (the dtd and ent files are saved locally). this is not a good option for
> me, since i want to parse arbitrary html documents (i'll do that with
> tagsoup and SAX2DOM, but first i want to get this to work)
> 2. creating a DOM, setting in the config the schema and scema-type, and
> using XPathAPI, or doc.getElementById
> 3. setting the system id in the InputSource
>
> from debugging, it seems that the parser doesn't recognize 'id' as being
> an id.
>
> i would really like the id() function to work (the xpath expressions are
> used by users, and id() is more natural, than defining keys, though i
> don't know how to do that either)
>
> thanks for your help,
> ittay
>
--
===================================
Ittay Dror
openQRM Team Leader,
R&D, Qlusters Inc.
ittayd@qlusters.com
+972-3-6081994 Fax: +972-3-6081841
http://www.openQRM.org
- Keeps your Data-Center Up and Running