You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Jos van den Oever <jv...@gmail.com> on 2005/02/14 11:40:46 UTC

dom build speed

Hello all,

I've a question concerning DOM build speed. In the version 2.6.2 version of 
Xerces, parsing a simple 825 byte xml file takes between 300 and 400 ms.
 I'm building the DOM from a java.io.Reader. The parse time is independent of 
the type of reader (StringReader, FileReader).

This is the code I use for creating the builder:

    DocumentBuilderFactory factory = DocumentBuilderFactory
      .newInstance();
    System.out.println(factory);
    factory.setNamespaceAware(true);
    factory.setValidating(false);
    factory.setExpandEntityReferences(false);
    docbuilder = factory.newDocumentBuilder();
    docbuilder.setErrorHandler(errorHandler);

Is the DOM build time always so slow? I've read some benchmarks on the web 
that were notably faster.

Cheers, Jos

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: dom build speed

Posted by Phil Weighill-Smith <ph...@volantis.com>.
For testing purposes, what you might like to do is create an
EntityResolver that will resolve the remote DTD to a locally held file
or resource. The resolver should be set on the XMLReader. This maintains
DTD validation but allows you to do it "locally" rather than by fetching
the DTD from a remote location.

Phil :n.

PS: Clearly you would have to copy the DTD, and any other supporting
files, off the web to provide them locally.

On Mon, 2005-02-14 at 12:48, Jos van den Oever wrote:

> On Monday 14 February 2005 13:44, Rick Bullotta wrote:
> > First of all, I would place those setProperty calls in a static initializer
> > so you don't call them each time.  Also, realize that it will always take
> > some time the first time you create a factory and/or builder.
> >
> > If you run your test "n" times, you should find it much faster on average
> > with the system properties set.
> 
> Hello Rick,
> 
> Yes, I'm actually running the parse 5 times and each time is slightly faster.
> 
> However, removing the <!DOCTYPE really speeds things up. Now parsing (after 
> creating the Builder) takes anywhere between 6 and 2 ms. So the DOCTYPE is 
> really slowing things down. Unfortunately, I can't remove it from the xhtml 
> file, because it would be invalid without it.
> 
> Is there a function to turn of DOCTYPE interpretation?
> 
> Cheers, Jos
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org

-- 
Phil Weighill-Smith <ph...@volantis.com>
Volantis Systems

Re: dom build speed

Posted by Martin Vysny <vy...@host.sk>.
Jos van den Oever wrote:

>On Monday 14 February 2005 13:44, Rick Bullotta wrote:
>  
>
>>First of all, I would place those setProperty calls in a static initializer
>>so you don't call them each time.  Also, realize that it will always take
>>some time the first time you create a factory and/or builder.
>>
>>If you run your test "n" times, you should find it much faster on average
>>with the system properties set.
>>    
>>
>
>Hello Rick,
>
>Yes, I'm actually running the parse 5 times and each time is slightly faster.
>
>However, removing the <!DOCTYPE really speeds things up. Now parsing (after 
>creating the Builder) takes anywhere between 6 and 2 ms. So the DOCTYPE is 
>really slowing things down. Unfortunately, I can't remove it from the xhtml 
>file, because it would be invalid without it.
>
>Is there a function to turn of DOCTYPE interpretation?
>
>Cheers, Jos
>  
>
Maybe Xerces is trying to read that DTD for validation. Try to store it 
somewhere on your disk and then redirect the request for

http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
by registering custom EntityResolver.



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: xpath

Posted by Robert van Loenhout <r....@greenvalley.nl>.
Robert van Loenhout wrote:
> Phil Weighill-Smith wrote:
> 
>> In this case, why not use:
>>
>> ...(root, "child::element[@name='\"value'/@name");
> 
> 
> Because the xpath is created dynamically.
> And even though a " quote can be detected first, it still does
> not help if both a " and a ' is contained in the value.

Okay, I found the answer to my question in the W3C XPath mailing list 
archive.
It is impossible to escape a (double)quote character in XPath 1.0.
The workaround is to use concat.

http://lists.w3.org/Archives/Public/www-xpath-comments/2004JanMar/0011.html


> 
> 
>>> How do I escape special characters in xpath to match attribute values?
>>>
>>> For example
>>>
>>>         Document doc = new DocumentImpl();
>>>         Element root = doc.createElement("root");
>>>         doc.appendChild(root);
>>>         Element child = doc.createElement("element");
>>>         child.setAttribute("name","\"value");
>>>         root.appendChild(child);
>>>         Node node = 
>>> XPathAPI.selectSingleNode(root,"child::element[@name=\"&quot;value\"]/@name"); 
>>>
>>>
>>> The node will be null, so this is not the right way. What is?
>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: xpath

Posted by Robert van Loenhout <r....@greenvalley.nl>.
Phil Weighill-Smith wrote:
> In this case, why not use:
> 
> ...(root, "child::element[@name='\"value'/@name");

Because the xpath is created dynamically.
And even though a " quote can be detected first, it still does
not help if both a " and a ' is contained in the value.


>>How do I escape special characters in xpath to match attribute values?
>>
>>For example
>>
>>         Document doc = new DocumentImpl();
>>         Element root = doc.createElement("root");
>>         doc.appendChild(root);
>>         Element child = doc.createElement("element");
>>         child.setAttribute("name","\"value");
>>         root.appendChild(child);
>>         Node node = 
>>XPathAPI.selectSingleNode(root,"child::element[@name=\"&quot;value\"]/@name");
>>
>>The node will be null, so this is not the right way. What is?
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: xpath

Posted by Phil Weighill-Smith <ph...@volantis.com>.
In this case, why not use:

...(root, "child::element[@name='\"value'/@name");

Phil :n.
On Thu, 2005-02-17 at 09:44, Robert van Loenhout wrote:

> Hi,
> 
> I just have a little xpath escape question for which I can't seem to 
> find the answer.
> 
> How do I escape special characters in xpath to match attribute values?
> 
> For example
> 
>          Document doc = new DocumentImpl();
>          Element root = doc.createElement("root");
>          doc.appendChild(root);
>          Element child = doc.createElement("element");
>          child.setAttribute("name","\"value");
>          root.appendChild(child);
>          Node node = 
> XPathAPI.selectSingleNode(root,"child::element[@name=\"&quot;value\"]/@name");
> 
> The node will be null, so this is not the right way. What is?
> 
> TIA,
> Robert
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org

-- 
Phil Weighill-Smith <ph...@volantis.com>
Volantis Systems

xpath

Posted by Robert van Loenhout <r....@greenvalley.nl>.
Hi,

I just have a little xpath escape question for which I can't seem to 
find the answer.

How do I escape special characters in xpath to match attribute values?

For example

         Document doc = new DocumentImpl();
         Element root = doc.createElement("root");
         doc.appendChild(root);
         Element child = doc.createElement("element");
         child.setAttribute("name","\"value");
         root.appendChild(child);
         Node node = 
XPathAPI.selectSingleNode(root,"child::element[@name=\"&quot;value\"]/@name");

The node will be null, so this is not the right way. What is?

TIA,
Robert


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: dom build speed

Posted by Jos van den Oever <jv...@gmail.com>.
Thanks for this tip Michael,

I'm working with setting the feature to false now, since I don't need to parse 
any entities. I'm simply replacing custom tags (<g:ejb/>, <g:include/>) with 
xhtml. What's actually in the html is not important:

Element e = findCustomElement(doc);
do {
 Node n = processCustomElement(e);
 e.getParentNode().replaceChild(n, e);
 e = findCustomElement();
} while (e != null);


My code now looks like this and works nice and quick!

   if (parser == null) {    
    parser = new DOMParser();
    parser.setFeature("http://xml.org/sax/features/validation", 
                       false);
    
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", 
                       false);
   }
   InputSource is = new InputSource(rwm);
   parser.setErrorHandler(errorHandler);
   parser.parse(is);
   doc = parser.getDocument();

Thank you all for your help!
Jos

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: dom build speed

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Try setting the 
http://apache.org/xml/features/nonvalidating/load-external-dtd feature [1] 
to false. This will turn off loading of external DTDs, if that's really 
what you want to do. DTDs are used for more than just validation.  They 
provide entity declarations, default attributes, attribute types (which 
affect attribute normalization) and also specify what whitespace in the 
document referring to the DTD is considered "ignorable". Alternatively you 
could try grammar caching [2]. The external DTD would be read once and 
then reused for subsequent parses.

[1] 
http://xml.apache.org/xerces2-j/features.html#nonvalidating.load-external-dtd
[2] http://xml.apache.org/xerces2-j/faq-grammars.html

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Jos van den Oever <jv...@gmail.com> wrote on 02/14/2005 07:48:55 AM:

> On Monday 14 February 2005 13:44, Rick Bullotta wrote:
> > First of all, I would place those setProperty calls in a static 
initializer
> > so you don't call them each time.  Also, realize that it will always 
take
> > some time the first time you create a factory and/or builder.
> >
> > If you run your test "n" times, you should find it much faster on 
average
> > with the system properties set.
> 
> Hello Rick,
> 
> Yes, I'm actually running the parse 5 times and each time is slightly 
faster.
> 
> However, removing the <!DOCTYPE really speeds things up. Now parsing 
(after 
> creating the Builder) takes anywhere between 6 and 2 ms. So the DOCTYPE 
is 
> really slowing things down. Unfortunately, I can't remove it from the 
xhtml 
> file, because it would be invalid without it.
> 
> Is there a function to turn of DOCTYPE interpretation?
> 
> Cheers, Jos
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: dom build speed

Posted by Jos van den Oever <jv...@gmail.com>.
On Monday 14 February 2005 13:44, Rick Bullotta wrote:
> First of all, I would place those setProperty calls in a static initializer
> so you don't call them each time.  Also, realize that it will always take
> some time the first time you create a factory and/or builder.
>
> If you run your test "n" times, you should find it much faster on average
> with the system properties set.

Hello Rick,

Yes, I'm actually running the parse 5 times and each time is slightly faster.

However, removing the <!DOCTYPE really speeds things up. Now parsing (after 
creating the Builder) takes anywhere between 6 and 2 ms. So the DOCTYPE is 
really slowing things down. Unfortunately, I can't remove it from the xhtml 
file, because it would be invalid without it.

Is there a function to turn of DOCTYPE interpretation?

Cheers, Jos

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: dom build speed

Posted by Rick Bullotta <ri...@lighthammer.com>.
First of all, I would place those setProperty calls in a static initializer
so you don't call them each time.  Also, realize that it will always take
some time the first time you create a factory and/or builder.

If you run your test "n" times, you should find it much faster on average
with the system properties set.


-----Original Message-----
From: Jos van den Oever [mailto:jvdoever@gmail.com] 
Sent: Monday, February 14, 2005 7:35 AM
To: xerces-j-user@xml.apache.org
Subject: Re: dom build speed

Hallo Rick,

Thanks for your reply. I added your suggested lines to the code, but I see
no 
speed increase at all. Just to make sure, here's the code I use for parsing 
now:

   if (docbuilder == null) {
    System.setProperty("javax.xml.transform.TransformerFactory",
      "org.apache.xalan.processor.TransformerFactoryImpl");
    System.setProperty("org.apache.xml.dtm.DTMManager",
      "org.apache.xml.dtm.ref.DTMManagerDefault");
    System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
      "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
    System.setProperty("javax.xml.parsers.SAXParserFactory",
      "org.apache.xerces.jaxp.SAXParserFactoryImpl");
    System.setProperty(
      "org.apache.xerces.xni.parser.XMLParserConfiguration",
      "org.apache.xerces.parsers.XML11Configuration");
    DocumentBuilderFactory factory = DocumentBuilderFactory
      .newInstance();
    System.out.println(factory);
    factory.setNamespaceAware(true);
    factory.setValidating(false);
    factory.setExpandEntityReferences(false);
    docbuilder = factory.newDocumentBuilder();
    docbuilder.setErrorHandler(errorHandler);
    System.out.println(docbuilder);
   }
   InputSource is = new InputSource(rwm);
   // output that measures the DOM parse time
   long t = System.currentTimeMillis();
   doc = docbuilder.parse(is);
   t = System.currentTimeMillis()-t;
   System.out.println("parseTime: "+String.valueOf(t));

Could the DOCTYPE declaration have anything to do with it?

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Cheers, Jos

PS, parsing the file with the qt library takes ~7 ms...

On Monday 14 February 2005 13:14, Rick Bullotta wrote:
> Add these system properties to your code and you may see a DRAMATIC
> increase in performance in DOM creation as well as Xpath and Transformer
> creation. Using this technique bypasses a lot of very expensive factory
> overhead which involves searching in the JAR file and the classpath for
> configuration information.
>
> // Xalan
>
> System.setProperty("javax.xml.transform.TransformerFactory",
> "org.apache.xalan.processor.TransformerFactoryImpl");
>
> System.setProperty("org.apache.xml.dtm.DTMManager",
> "org.apache.xml.dtm.ref.DTMManagerDefault");
>
> // Xerces
>
> System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
> "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
>
> System.setProperty("javax.xml.parsers.SAXParserFactory",
> "org.apache.xerces.jaxp.SAXParserFactoryImpl");
>
> System.setProperty("org.apache.xerces.xni.parser.XMLParserConfiguration",
> "org.apache.xerces.parsers.XML11Configuration");
>
>
> Rick Bullotta
> CTO
> Lighthammer Software (http://www.lighthammer.com)
>
> -----Original Message-----
> From: Jos van den Oever [mailto:jvdoever@gmail.com]
> Sent: Monday, February 14, 2005 6:56 AM
> To: xerces-j-user@xml.apache.org
> Subject: Re: dom build speed
>
> > I've run up against this problem as well.  If I'm not mistaken, the
> > problem is the way Xerces handles setting properties on a DOM parser
> > via the JAXP interfaces.  Xerces will create a new instance (!) of the
> > DOMParser class for every single property you set (i.e., via
> > setValidating(), setNamespaceAware(), etc.).  This wouldn't be so bad
> > if it wasn't so expensive to create a DOMParser instance, but it is.
> > You should really be timing the parse and not the setup and parse.
> >
> > I was sort of shocked when I first discovered how poorly Xerces
> > handles setting JAXP DocumentBuilder properties, but from
> > conversations on this mailing list it's obviously a known problem and
> > there doesn't seem to be much that can be done about it.  Shrug.
>
> Hello Curtis,
>
> What I didn't mention: I'm reusing the DocumentBuilder and this gives me
> only
> a small speedup. What I'm timing is the parsing itself, not the creation
of
> the DocumentBuilder. Even if creating the DocumentBuilder was really slow,
> it
> wouldn't really be a problem because I'm using it in a Bean and it's only
> created once per Bean.
>
> Cheers, Jos
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: dom build speed

Posted by Jos van den Oever <jv...@gmail.com>.
Hallo Rick,

Thanks for your reply. I added your suggested lines to the code, but I see no 
speed increase at all. Just to make sure, here's the code I use for parsing 
now:

   if (docbuilder == null) {
    System.setProperty("javax.xml.transform.TransformerFactory",
      "org.apache.xalan.processor.TransformerFactoryImpl");
    System.setProperty("org.apache.xml.dtm.DTMManager",
      "org.apache.xml.dtm.ref.DTMManagerDefault");
    System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
      "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
    System.setProperty("javax.xml.parsers.SAXParserFactory",
      "org.apache.xerces.jaxp.SAXParserFactoryImpl");
    System.setProperty(
      "org.apache.xerces.xni.parser.XMLParserConfiguration",
      "org.apache.xerces.parsers.XML11Configuration");
    DocumentBuilderFactory factory = DocumentBuilderFactory
      .newInstance();
    System.out.println(factory);
    factory.setNamespaceAware(true);
    factory.setValidating(false);
    factory.setExpandEntityReferences(false);
    docbuilder = factory.newDocumentBuilder();
    docbuilder.setErrorHandler(errorHandler);
    System.out.println(docbuilder);
   }
   InputSource is = new InputSource(rwm);
   // output that measures the DOM parse time
   long t = System.currentTimeMillis();
   doc = docbuilder.parse(is);
   t = System.currentTimeMillis()-t;
   System.out.println("parseTime: "+String.valueOf(t));

Could the DOCTYPE declaration have anything to do with it?

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Cheers, Jos

PS, parsing the file with the qt library takes ~7 ms...

On Monday 14 February 2005 13:14, Rick Bullotta wrote:
> Add these system properties to your code and you may see a DRAMATIC
> increase in performance in DOM creation as well as Xpath and Transformer
> creation. Using this technique bypasses a lot of very expensive factory
> overhead which involves searching in the JAR file and the classpath for
> configuration information.
>
> // Xalan
>
> System.setProperty("javax.xml.transform.TransformerFactory",
> "org.apache.xalan.processor.TransformerFactoryImpl");
>
> System.setProperty("org.apache.xml.dtm.DTMManager",
> "org.apache.xml.dtm.ref.DTMManagerDefault");
>
> // Xerces
>
> System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
> "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
>
> System.setProperty("javax.xml.parsers.SAXParserFactory",
> "org.apache.xerces.jaxp.SAXParserFactoryImpl");
>
> System.setProperty("org.apache.xerces.xni.parser.XMLParserConfiguration",
> "org.apache.xerces.parsers.XML11Configuration");
>
>
> Rick Bullotta
> CTO
> Lighthammer Software (http://www.lighthammer.com)
>
> -----Original Message-----
> From: Jos van den Oever [mailto:jvdoever@gmail.com]
> Sent: Monday, February 14, 2005 6:56 AM
> To: xerces-j-user@xml.apache.org
> Subject: Re: dom build speed
>
> > I've run up against this problem as well.  If I'm not mistaken, the
> > problem is the way Xerces handles setting properties on a DOM parser
> > via the JAXP interfaces.  Xerces will create a new instance (!) of the
> > DOMParser class for every single property you set (i.e., via
> > setValidating(), setNamespaceAware(), etc.).  This wouldn't be so bad
> > if it wasn't so expensive to create a DOMParser instance, but it is.
> > You should really be timing the parse and not the setup and parse.
> >
> > I was sort of shocked when I first discovered how poorly Xerces
> > handles setting JAXP DocumentBuilder properties, but from
> > conversations on this mailing list it's obviously a known problem and
> > there doesn't seem to be much that can be done about it.  Shrug.
>
> Hello Curtis,
>
> What I didn't mention: I'm reusing the DocumentBuilder and this gives me
> only
> a small speedup. What I'm timing is the parsing itself, not the creation of
> the DocumentBuilder. Even if creating the DocumentBuilder was really slow,
> it
> wouldn't really be a problem because I'm using it in a Bean and it's only
> created once per Bean.
>
> Cheers, Jos
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: dom build speed

Posted by Rick Bullotta <ri...@lighthammer.com>.
Add these system properties to your code and you may see a DRAMATIC increase
in performance in DOM creation as well as Xpath and Transformer creation.
Using this technique bypasses a lot of very expensive factory overhead which
involves searching in the JAR file and the classpath for configuration
information.

// Xalan

System.setProperty("javax.xml.transform.TransformerFactory",
"org.apache.xalan.processor.TransformerFactoryImpl");

System.setProperty("org.apache.xml.dtm.DTMManager",
"org.apache.xml.dtm.ref.DTMManagerDefault");

// Xerces

System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
"org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");

System.setProperty("javax.xml.parsers.SAXParserFactory",
"org.apache.xerces.jaxp.SAXParserFactoryImpl");

System.setProperty("org.apache.xerces.xni.parser.XMLParserConfiguration",
"org.apache.xerces.parsers.XML11Configuration");


Rick Bullotta
CTO
Lighthammer Software (http://www.lighthammer.com)

-----Original Message-----
From: Jos van den Oever [mailto:jvdoever@gmail.com] 
Sent: Monday, February 14, 2005 6:56 AM
To: xerces-j-user@xml.apache.org
Subject: Re: dom build speed

> I've run up against this problem as well.  If I'm not mistaken, the
> problem is the way Xerces handles setting properties on a DOM parser
> via the JAXP interfaces.  Xerces will create a new instance (!) of the
> DOMParser class for every single property you set (i.e., via
> setValidating(), setNamespaceAware(), etc.).  This wouldn't be so bad
> if it wasn't so expensive to create a DOMParser instance, but it is.
> You should really be timing the parse and not the setup and parse.
>
> I was sort of shocked when I first discovered how poorly Xerces
> handles setting JAXP DocumentBuilder properties, but from
> conversations on this mailing list it's obviously a known problem and
> there doesn't seem to be much that can be done about it.  Shrug.

Hello Curtis,

What I didn't mention: I'm reusing the DocumentBuilder and this gives me
only 
a small speedup. What I'm timing is the parsing itself, not the creation of 
the DocumentBuilder. Even if creating the DocumentBuilder was really slow,
it 
wouldn't really be a problem because I'm using it in a Bean and it's only 
created once per Bean.

Cheers, Jos

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: dom build speed

Posted by Jos van den Oever <jv...@gmail.com>.
> I've run up against this problem as well.  If I'm not mistaken, the
> problem is the way Xerces handles setting properties on a DOM parser
> via the JAXP interfaces.  Xerces will create a new instance (!) of the
> DOMParser class for every single property you set (i.e., via
> setValidating(), setNamespaceAware(), etc.).  This wouldn't be so bad
> if it wasn't so expensive to create a DOMParser instance, but it is.
> You should really be timing the parse and not the setup and parse.
>
> I was sort of shocked when I first discovered how poorly Xerces
> handles setting JAXP DocumentBuilder properties, but from
> conversations on this mailing list it's obviously a known problem and
> there doesn't seem to be much that can be done about it.  Shrug.

Hello Curtis,

What I didn't mention: I'm reusing the DocumentBuilder and this gives me only 
a small speedup. What I'm timing is the parsing itself, not the creation of 
the DocumentBuilder. Even if creating the DocumentBuilder was really slow, it 
wouldn't really be a problem because I'm using it in a Bean and it's only 
created once per Bean.

Cheers, Jos

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: XML Declaration

Posted by Alistair Young <al...@smo.uhi.ac.uk>.
thanks all for your replies. I was getting the error:

"The processing instruction target matching "[xX][mM][lL]" is not 
allowed."

I had a comment as the first line in the file. Removing it fixed the 
problem.

many thanks,
Alistair

On 14 Feb 2005, at 22:58, Joseph Kesselman wrote:

>
>
>
>
> The XML Declaration must start at the very first character of the XML
> document being parsed -- if you have a blank line or space in front of 
> it,
> fix that. (The one exception is that a two-byte Byte Order Mark may 
> preceed
> the XML Declaration.)
>
> ______________________________________
> Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
> "The world changed profoundly and unpredictably the day Tim Berners Lee
> got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: XML Declaration

Posted by Joseph Kesselman <ke...@us.ibm.com>.



The XML Declaration must start at the very first character of the XML
document being parsed -- if you have a blank line or space in front of it,
fix that. (The one exception is that a two-byte Byte Order Mark may preceed
the XML Declaration.)

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: XML Declaration

Posted by Bob Foster <bo...@objfac.com>.
The xml declaration is not a processing instruction. The Oracle parser 
is wrong.

Bob Foster

Jason wrote:
> Hi Alistair,
> 
> I'm not sure why you can't parse the xml.  I've looked
> at this a bit more and it seems that the Oracle parser
> is reporting the xml delcaration as a Processing
> Instruction.  It never occurred to me but I guess that
> the xml declaration does follow the PI format.  So, I
> guess the question is, is it a PI or isn't it?
> 
> -jason
> 
> --- Alistair Young <al...@smo.uhi.ac.uk> wrote:
> 
> 
>>><?xml version="1.0" encoding="UTF-8" ?>
>>
>>using xerces through JAXP, if this line is present,
>>I get a SAXException
>>every time when parsing the doc, along the lines of
>>xml element not
>>allowed here. I have to remove the line before the
>>doc will parse.
>>Am I missing something?
>>
>>thanks,
>>Alistair
>>
>>
>>-- 
>>Alistair Young
>>Senior Software Engineer
>>UHI@Sabhal Mòr Ostaig
>>Isle of Skye
>>Scotland
>>
>>
>>>Hello,
>>>
>>>Consider the following xml snippet:
>>>
>>><?xml version="1.0" encoding="UTF-8" ?>
>>><foo TYPE="bar">
>>></foo>
>>>
>>>I have some code which uses a jaxp DocumentBuilder
>>
>>to
>>
>>>parse this xml and then walk to tree and do
>>
>>various
>>
>>>things.  This code works fine with xerces (1.4.x
>>
>>and
>>
>>>2.2.x) but when it's deployed in the Oracle 10g
>>>application server which uses it's own jaxp
>>>implementation there are problems.  It turns out
>>
>>that
>>
>>>the xerces Document resulting from the parse has
>>
>>only
>>
>>>one child which is the 'foo' element while the
>>
>>Oracle
>>
>>>Document has two - the first is the xml
>>
>>declaration
>>
>>>and the second is the 'foo' element.
>>>
>>>Obviusly I can code around this easily enough but
>>
>>I'm
>>
>>>wondering which behavior is correct with respect
>>
>>to
>>
>>>the standard or if the standard even applies.
>>>Possibly the implemenations are free to make
>>
>>whichever
>>
>>>choice they want?  Thanks in advance.
>>>
>>>-jason
>>>
>>>
>>>
>>>
>>>__________________________________
>>>Do you Yahoo!?
>>>Read only the mail you want - Yahoo! Mail
>>
>>SpamGuard.
>>
>>>http://promotions.yahoo.com/new_mail
>>>
>>>
>>
> ---------------------------------------------------------------------
> 
>>>To unsubscribe, e-mail:
>>
>>xerces-j-user-unsubscribe@xml.apache.org
>>
>>>For additional commands, e-mail:
>>
>>xerces-j-user-help@xml.apache.org
>>
>>>
>>
>>
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail:
>>xerces-j-user-unsubscribe@xml.apache.org
>>For additional commands, e-mail:
>>xerces-j-user-help@xml.apache.org
>>
>>
> 
> 
> 
> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> Yahoo! Mail - Find what you need with new enhanced search.
> http://info.mail.yahoo.com/mail_250
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: XML Declaration

Posted by Jason <ja...@yahoo.com>.
Hi Alistair,

I'm not sure why you can't parse the xml.  I've looked
at this a bit more and it seems that the Oracle parser
is reporting the xml delcaration as a Processing
Instruction.  It never occurred to me but I guess that
the xml declaration does follow the PI format.  So, I
guess the question is, is it a PI or isn't it?

-jason

--- Alistair Young <al...@smo.uhi.ac.uk> wrote:

> > <?xml version="1.0" encoding="UTF-8" ?>
> using xerces through JAXP, if this line is present,
> I get a SAXException
> every time when parsing the doc, along the lines of
> xml element not
> allowed here. I have to remove the line before the
> doc will parse.
> Am I missing something?
> 
> thanks,
> Alistair
> 
> 
> -- 
> Alistair Young
> Senior Software Engineer
> UHI@Sabhal M�r Ostaig
> Isle of Skye
> Scotland
> 
> > Hello,
> >
> > Consider the following xml snippet:
> >
> > <?xml version="1.0" encoding="UTF-8" ?>
> > <foo TYPE="bar">
> > </foo>
> >
> > I have some code which uses a jaxp DocumentBuilder
> to
> > parse this xml and then walk to tree and do
> various
> > things.  This code works fine with xerces (1.4.x
> and
> > 2.2.x) but when it's deployed in the Oracle 10g
> > application server which uses it's own jaxp
> > implementation there are problems.  It turns out
> that
> > the xerces Document resulting from the parse has
> only
> > one child which is the 'foo' element while the
> Oracle
> > Document has two - the first is the xml
> declaration
> > and the second is the 'foo' element.
> >
> > Obviusly I can code around this easily enough but
> I'm
> > wondering which behavior is correct with respect
> to
> > the standard or if the standard even applies.
> > Possibly the implemenations are free to make
> whichever
> > choice they want?  Thanks in advance.
> >
> > -jason
> >
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > Read only the mail you want - Yahoo! Mail
> SpamGuard.
> > http://promotions.yahoo.com/new_mail
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> xerces-j-user-unsubscribe@xml.apache.org
> > For additional commands, e-mail:
> xerces-j-user-help@xml.apache.org
> >
> >
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail:
> xerces-j-user-help@xml.apache.org
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: XML Declaration

Posted by Alistair Young <al...@smo.uhi.ac.uk>.
> <?xml version="1.0" encoding="UTF-8" ?>
using xerces through JAXP, if this line is present, I get a SAXException
every time when parsing the doc, along the lines of xml element not
allowed here. I have to remove the line before the doc will parse.
Am I missing something?

thanks,
Alistair


-- 
Alistair Young
Senior Software Engineer
UHI@Sabhal Mòr Ostaig
Isle of Skye
Scotland

> Hello,
>
> Consider the following xml snippet:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <foo TYPE="bar">
> </foo>
>
> I have some code which uses a jaxp DocumentBuilder to
> parse this xml and then walk to tree and do various
> things.  This code works fine with xerces (1.4.x and
> 2.2.x) but when it's deployed in the Oracle 10g
> application server which uses it's own jaxp
> implementation there are problems.  It turns out that
> the xerces Document resulting from the parse has only
> one child which is the 'foo' element while the Oracle
> Document has two - the first is the xml declaration
> and the second is the 'foo' element.
>
> Obviusly I can code around this easily enough but I'm
> wondering which behavior is correct with respect to
> the standard or if the standard even applies.
> Possibly the implemenations are free to make whichever
> choice they want?  Thanks in advance.
>
> -jason
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Read only the mail you want - Yahoo! Mail SpamGuard.
> http://promotions.yahoo.com/new_mail
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: XML Declaration

Posted by Jason <ja...@yahoo.com>.
Thanks Joe, for the clarification.  I've found that I
can side step the issue altogether if I call
getDocumentElement( ) rather than getChild( ).

-jason
--- Joseph Kesselman <ke...@us.ibm.com> wrote:

> 
> 
> 
> 
> Oracle is wrong. The XML Declaration is not a child
> in the DOM. Complain to
> them and see if they offer a mode which handles it
> properly.
> 
> (The DOM had no standard API for the XML Declaration
> until DOM Level 3,
> which is part of why some parsers tried to cheat by
> turning it into a
> special node or -- erroneously -- calling it a
> Processing Instruction. But
> the DOM WG has said repeatedly that this kluge was,
> in fact, incorrect.)
> 
> ______________________________________
> Joe Kesselman, IBM Next-Generation Web Technologies:
> XML, XSL and more.
> "The world changed profoundly and unpredictably the
> day Tim Berners Lee
> got bitten by a radioactive spider." -- Rafe Culpin,
> in r.m.filk
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail:
> xerces-j-user-help@xml.apache.org
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: XML Declaration

Posted by Joseph Kesselman <ke...@us.ibm.com>.



Oracle is wrong. The XML Declaration is not a child in the DOM. Complain to
them and see if they offer a mode which handles it properly.

(The DOM had no standard API for the XML Declaration until DOM Level 3,
which is part of why some parsers tried to cheat by turning it into a
special node or -- erroneously -- calling it a Processing Instruction. But
the DOM WG has said repeatedly that this kluge was, in fact, incorrect.)

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


XML Declaration

Posted by Jason <ja...@yahoo.com>.
Hello,

Consider the following xml snippet:

<?xml version="1.0" encoding="UTF-8" ?> 
<foo TYPE="bar">
</foo>

I have some code which uses a jaxp DocumentBuilder to
parse this xml and then walk to tree and do various
things.  This code works fine with xerces (1.4.x and
2.2.x) but when it's deployed in the Oracle 10g
application server which uses it's own jaxp
implementation there are problems.  It turns out that
the xerces Document resulting from the parse has only
one child which is the 'foo' element while the Oracle
Document has two - the first is the xml declaration
and the second is the 'foo' element.

Obviusly I can code around this easily enough but I'm
wondering which behavior is correct with respect to
the standard or if the standard even applies. 
Possibly the implemenations are free to make whichever
choice they want?  Thanks in advance.

-jason 



		
__________________________________ 
Do you Yahoo!? 
Read only the mail you want - Yahoo! Mail SpamGuard. 
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: dom build speed

Posted by Rick Bullotta <ri...@lighthammer.com>.
I wouldn't say Xerces handles these "poorly".  Rather, it handles them quite
"flexibly".  As a result, there is a performance penalty.  As long as we can
be made aware of the performance implications and "high performance" usage
scenarios, I think it is the best of both worlds!

Rick Bullotta
CTO
Lighthammer Software (http://www.lighthammer.com)

-----Original Message-----

> I was sort of shocked when I first discovered how poorly Xerces
> handles setting JAXP DocumentBuilder properties, but from
> conversations on this mailing list it's obviously a known problem and
> there doesn't seem to be much that can be done about it.  Shrug.

If anyone's curious what the problem is, here's the explanation [1] I gave 
the last time this came up.

> 
> Curtiss Howard
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: dom build speed

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Curtiss Howard <cu...@gmail.com> wrote on 02/14/2005 06:30:05 AM:

> On Mon, 14 Feb 2005 11:40:46 +0100, Jos van den Oever
> <jv...@gmail.com> wrote:
> > Hello all,
> > 
> > I've a question concerning DOM build speed. In the version 2.6.2 
version of
> > Xerces, parsing a simple 825 byte xml file takes between 300 and 400 
ms.
> > I'm building the DOM from a java.io.Reader. The parse time is 
independent of
> > the type of reader (StringReader, FileReader).
> > 
> > This is the code I use for creating the builder:
> > 
> >    DocumentBuilderFactory factory = DocumentBuilderFactory
> >      .newInstance();
> >    System.out.println(factory);
> >    factory.setNamespaceAware(true);
> >    factory.setValidating(false);
> >    factory.setExpandEntityReferences(false);
> >    docbuilder = factory.newDocumentBuilder();
> >    docbuilder.setErrorHandler(errorHandler);
> > 
> > Is the DOM build time always so slow? I've read some benchmarks on the 
web
> > that were notably faster.
> > 
> > Cheers, Jos
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> > 
> > 
> 
> I've run up against this problem as well.  If I'm not mistaken, the
> problem is the way Xerces handles setting properties on a DOM parser
> via the JAXP interfaces.  Xerces will create a new instance (!) of the
> DOMParser class for every single property you set (i.e., via
> setValidating(), setNamespaceAware(), etc.).  This wouldn't be so bad
> if it wasn't so expensive to create a DOMParser instance, but it is. 
> You should really be timing the parse and not the setup and parse.

Actually, this is only a problem with setAttribute(). The other methods 
just set a boolean which isn't read until the application explicitly 
creates a parser from the factory.
 
> I was sort of shocked when I first discovered how poorly Xerces
> handles setting JAXP DocumentBuilder properties, but from
> conversations on this mailing list it's obviously a known problem and
> there doesn't seem to be much that can be done about it.  Shrug.

If anyone's curious what the problem is, here's the explanation [1] I gave 
the last time this came up.

> 
> Curtiss Howard
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 

[1] http://marc.theaimsgroup.com/?l=xerces-j-user&m=110347351515748&w=2

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: dom build speed

Posted by Curtiss Howard <cu...@gmail.com>.
On Mon, 14 Feb 2005 11:40:46 +0100, Jos van den Oever
<jv...@gmail.com> wrote:
> Hello all,
> 
> I've a question concerning DOM build speed. In the version 2.6.2 version of
> Xerces, parsing a simple 825 byte xml file takes between 300 and 400 ms.
> I'm building the DOM from a java.io.Reader. The parse time is independent of
> the type of reader (StringReader, FileReader).
> 
> This is the code I use for creating the builder:
> 
>    DocumentBuilderFactory factory = DocumentBuilderFactory
>      .newInstance();
>    System.out.println(factory);
>    factory.setNamespaceAware(true);
>    factory.setValidating(false);
>    factory.setExpandEntityReferences(false);
>    docbuilder = factory.newDocumentBuilder();
>    docbuilder.setErrorHandler(errorHandler);
> 
> Is the DOM build time always so slow? I've read some benchmarks on the web
> that were notably faster.
> 
> Cheers, Jos
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 

I've run up against this problem as well.  If I'm not mistaken, the
problem is the way Xerces handles setting properties on a DOM parser
via the JAXP interfaces.  Xerces will create a new instance (!) of the
DOMParser class for every single property you set (i.e., via
setValidating(), setNamespaceAware(), etc.).  This wouldn't be so bad
if it wasn't so expensive to create a DOMParser instance, but it is. 
You should really be timing the parse and not the setup and parse.

I was sort of shocked when I first discovered how poorly Xerces
handles setting JAXP DocumentBuilder properties, but from
conversations on this mailing list it's obviously a known problem and
there doesn't seem to be much that can be done about it.  Shrug.


Curtiss Howard

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org