You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@freemarker.apache.org by Christoph Rüger <c....@synesty.com> on 2017/10/12 17:35:42 UTC

Slow access to XML node attribute because of XPath in NodeModel.get()?

Hi,

we have the following XML snippet:

<PRODUCT_DETAILS>
  <INTERNATIONAL_PID *type="GTIN"*>4011395534809</INTERNATIONAL_PID>
</PRODUCT_DETAILS>

We iterate over XML-Nodes like this:

<#list product["PRODUCT_DETAILS"]?children as detail>
  nodeValue: ${detail!}
</#list>

This code snippet is pretty fast (result appears after e.g. 3 seconds). But
when we turn it into this and try to add the node attribute "type"...

<#list product["PRODUCT_DETAILS"]?children as detail>
 <#if detail.*@type*[0]?? && detail.*@type*[0] == "GTIN">
  node*Attribute* type: ${detail.@type}
  nodeValue: ${detail!}
 </#if>
</#list>

...it gets very very slow (result appears after > 1 minute). We have a
large XML with > 1000 products here.

I have digged a little bit in the code and found that when the check for
*detail.@type[0]??* is in the mix, then
*freemarker.ext.dom.NodeModel.get(String) *is jumping into the else-branch
which uses XPath:

 public TemplateModel get(String key) throws TemplateModelException {

   if (key.startsWith("@@")) {
    // ....

}

else{

XPathSupport xps = getXPathSupport();
          if (xps != null) {

*return xps.executeQuery(node, key);*            }

}

This *xps.executeQuery(node, key); *seems to be the reason why it gets
slow. Internally it
calls freemarker.ext.dom.SunInternalXalanXPathSupport.executeQuery(Object,
String)

To work around this problem I have tried creating an own
TemplateMethodModel ${nodeAttrib(node, attributename)}

where I do something like this (I know it's ugly but experimental...) :

new TemplateMethodModelEx(){

public Object exec(List arg0) throws TemplateModelException {

Object node = arg0.get(0);

Object attribute = arg0.get(1);

if(node instanceof NodeModel) {

NamedNodeMap attributes = ((NodeModel) node).getNode().getAttributes();

if(attributes != null && attributes.getLength() > 0) {

Node namedItem = *attributes.getNamedItem(attribute.toString())*;

if(namedItem != null) {

System.out.println(namedItem.getNodeValue());

return namedItem.getNodeValue();

}

else {

return "";

}

}

}

return "";

}


When we then use this it is much faster:
<#list product["PRODUCT_DETAILS"]?children as detail>
  <#if *nodeAttribute(details, "type")!* == "GTIN" >
   node*Attribute* type: ${*nodeAttribute(details, "type")!*}
   nodeValue: ${detail!}
  </#if>
</#list>

It seems that NamedNodeMap.*getNamedItem() *much faster and does not use
XPath. Although it is also iterating internally over the whole array to
find the index of the attribute - the overhead seems to be less then XPath.

My questions are:
1. What is the reason that *detail.@type[0]?? *causes a heavy XPath
evaluation under the hood?
2. Are we doing something wrong?
3. Should we go the custom TemplateMethodModel way? Is this approach ok?

I hope this was understandable.

Thanks
Christoph

-- 
Synesty GmbH
Moritz-von-Rohr-Str. 1a
07745 Jena
Tel.: +49 3641 559649
Fax.: +49 3641 5596499
Internet: http://synesty.com

Geschäftsführer: Christoph Rüger
Unternehmenssitz: Jena
Handelsregister B beim Amtsgericht: Jena
Handelsregister-Nummer: HRB 508766
Ust-IdNr.: DE287564982

Re: Slow access to XML node attribute because of XPath in NodeModel.get()?

Posted by Daniel Dekany <dd...@apache.org>.

I would like to add that this DOM wrapper model is one of the less
known parts of the code to me. Barring some small changes, it's the
same as it was when long long ago it was written by the original
authors. That raises the probability of finding improvement
opportunities there quite much. Like, I see that at least when using
Xalan, it calls `new org.apache.xpath.XPath.XPath(String,
SourceLocator, PrefixResolver, int, ErrorListener)` each time you
execute a query. That can be a big performance killer. But I don't
know if it can be avoided. These should be researched. Contributions
on that filed are highly welcome.

BTW, you are using Xalan for XPath, the one embedded into the JDK
(hence it's in Sun package). There's an option for using Jaxen
instead. I suspect Jaxen will be slower, not faster, but who knows...
might worths a try. For that, add
https://mvnrepository.com/artifact/jaxen/jaxen/1.1.6 to your
dependencies, and call NodeModel.useJaxenXPathSupport() sometimes
during application startup.


Friday, October 13, 2017, 8:01:17 AM, Christoph Rüger wrote:

> 2017-10-13 0:33 GMT+02:00 Daniel Dekany <dd...@apache.org>:
>
>> Actually, in `someNode.something` (which is the same as
>> `someNode["something"]`) the "something" meant to be XPath (appart
>> from a few FreeMarker-specific things).
>
> But as you have seen in some
>> very simple cases we solve the XPath query "natively" instead of
>> delegating it to XPath.
>
>
> Thanks Daniel, good to know. So there probably was a reason for this too in
> the past.
>
> Now if profiling reliably(!) shows that it's
>> indeed XPath in this case that's so slow, then it seems it would be
>
> better if we take over even more cases from XPath, such as
>> "@something"... I will have to look into that though. (This can be
>> tricky though, as you replace the implementation of that kind of query
>> under thousands of already working applications, so it really has to
>> behave exactly like the real thing earlier.
>
>
> I agree, this is very risky and my poor-mans "benchmark" is not reliable at
> all. I can try if I can find some time to come up with a programmatic
> test-case maybe which makes it better visible.
> Your insights tell me so far that we better go with the custom
> TemplateMethodModel for now (shortterm) to solve our specific business
> problem. Less risk.
>
> I will try fiddling around with the sources of
> freemarker.ext.dom.NodeModel.get(String)
> etc. which already do some "key.startsWith("@@")" checks. Maybe I find the
> place where to add an improvement and try it out.
>
>
>> That's unlike with
>> product.PRODUCT_DETAILS which was never delegated to the real XPath
>> implementation.)
>>
>
>> Thursday, October 12, 2017, 7:35:42 PM, Christoph Rüger wrote:
>>
>> > Hi,
>> >
>> > we have the following XML snippet:
>> >
>> > <PRODUCT_DETAILS>
>> >   <INTERNATIONAL_PID *type="GTIN"*>4011395534809</INTERNATIONAL_PID>
>> > </PRODUCT_DETAILS>
>> >
>> > We iterate over XML-Nodes like this:
>> >
>> > <#list product["PRODUCT_DETAILS"]?children as detail>
>>
>> Note that you can write `product["PRODUCT_DETAILS"]` as
>> `product.PRODUCT_DETAILS`.
>>
>
> Ok. Usually we do. Not sure why we didn't here. Was a code snippet from a
> colleague :)
>
>
>>
>> >   nodeValue: ${detail!}
>> > </#list>
>> >
>> > This code snippet is pretty fast (result appears after e.g. 3 seconds).
>> But
>> > when we turn it into this and try to add the node attribute "type"...
>> >
>> > <#list product["PRODUCT_DETAILS"]?children as detail>
>> >  <#if detail.*@type*[0]?? && detail.*@type*[0] == "GTIN">
>> >   node*Attribute* type: ${detail.@type}
>> >   nodeValue: ${detail!}
>> >  </#if>
>> > </#list>
>> >
>> > ...it gets very very slow (result appears after > 1 minute). We have a
>> large XML with >> 1000 products here.
>> >
>> > I have digged a little bit in the code and found that when the check for
>> > *detail.@type[0]??* is in the mix, then
>> > *freemarker.ext.dom.NodeModel.get(String) *is jumping into the
>> else-branch
>> > which uses XPath:
>> >
>> >  public TemplateModel get(String key) throws TemplateModelException {
>> >
>> >    if (key.startsWith("@@")) {
>> >     // ....
>> >
>> > }
>> >
>> > else{
>> >
>> > XPathSupport xps = getXPathSupport();
>> >           if (xps != null) {
>> >
>> > *return xps.executeQuery(node, key);*            }
>> >
>> > }
>> >
>> > This *xps.executeQuery(node, key); *seems to be the reason why it gets
>> > slow. Internally it
>> > calls
>> > freemarker.ext.dom.SunInternalXalanXPathSupport.executeQuery(Object,
>> > String)
>> >
>> > To work around this problem I have tried creating an own
>> > TemplateMethodModel ${nodeAttrib(node, attributename)}
>> >
>> > where I do something like this (I know it's ugly but experimental...) :
>> >
>> > new TemplateMethodModelEx(){
>> >
>> > public Object exec(List arg0) throws TemplateModelException {
>> >
>> > Object node = arg0.get(0);
>> >
>> > Object attribute = arg0.get(1);
>> >
>> > if(node instanceof NodeModel) {
>> >
>> > NamedNodeMap attributes = ((NodeModel) node).getNode().getAttributes();
>> >
>> > if(attributes != null && attributes.getLength() > 0) {
>> >
>> > Node namedItem = *attributes.getNamedItem(attribute.toString())*;
>> >
>> > if(namedItem != null) {
>> >
>> > System.out.println(namedItem.getNodeValue());
>> >
>> > return namedItem.getNodeValue();
>> >
>> > }
>> >
>> > else {
>> >
>> > return "";
>> >
>> > }
>> >
>> > }
>> >
>> > }
>> >
>> > return "";
>> >
>> > }
>> >
>> >
>> > When we then use this it is much faster:
>> > <#list product["PRODUCT_DETAILS"]?children as detail>
>> >   <#if *nodeAttribute(details, "type")!* == "GTIN" >
>> >    node*Attribute* type: ${*nodeAttribute(details, "type")!*}
>> >    nodeValue: ${detail!}
>> >   </#if>
>> > </#list>
>> >
>> > It seems that NamedNodeMap.*getNamedItem() *much faster and does not use
>> > XPath. Although it is also iterating internally over the whole array to
>> > find the index of the attribute - the overhead seems to be less then
>> XPath.
>> >
>> > My questions are:
>> > 1. What is the reason that *detail.@type[0]?? *causes a heavy XPath
>> > evaluation under the hood?
>> > 2. Are we doing something wrong?
>> > 3. Should we go the custom TemplateMethodModel way? Is this approach ok?
>> >
>> > I hope this was understandable.
>> >
>> > Thanks
>> > Christoph
>> >
>>
>> --
>> Thanks,
>>  Daniel Dekany
>>
>>
>
>
> -- 
> Christoph Rüger, Geschäftsführer
> Synesty <https://synesty.com/> - Anbinden und Automatisieren ohne
> Programmieren - Automatisierung, Schnittstellen, Datenfeeds
>
> Xing: https://www.xing.com/profile/Christoph_Rueger2
> LinkedIn: http://www.linkedin.com/pub/christoph-rueger/a/685/198
>

-- 
Thanks,
 Daniel Dekany

Re: Slow access to XML node attribute because of XPath in NodeModel.get()?

Posted by Christoph Rüger <c....@synesty.com>.

2017-10-13 0:33 GMT+02:00 Daniel Dekany <dd...@apache.org>:

> Actually, in `someNode.something` (which is the same as
> `someNode["something"]`) the "something" meant to be XPath (appart
> from a few FreeMarker-specific things).

But as you have seen in some
> very simple cases we solve the XPath query "natively" instead of
> delegating it to XPath.


Thanks Daniel, good to know. So there probably was a reason for this too in
the past.

Now if profiling reliably(!) shows that it's
> indeed XPath in this case that's so slow, then it seems it would be

better if we take over even more cases from XPath, such as
> "@something"... I will have to look into that though. (This can be
> tricky though, as you replace the implementation of that kind of query
> under thousands of already working applications, so it really has to
> behave exactly like the real thing earlier.


I agree, this is very risky and my poor-mans "benchmark" is not reliable at
all. I can try if I can find some time to come up with a programmatic
test-case maybe which makes it better visible.
Your insights tell me so far that we better go with the custom
TemplateMethodModel for now (shortterm) to solve our specific business
problem. Less risk.

I will try fiddling around with the sources of
freemarker.ext.dom.NodeModel.get(String)
etc. which already do some "key.startsWith("@@")" checks. Maybe I find the
place where to add an improvement and try it out.


> That's unlike with
> product.PRODUCT_DETAILS which was never delegated to the real XPath
> implementation.)
>

> Thursday, October 12, 2017, 7:35:42 PM, Christoph Rüger wrote:
>
> > Hi,
> >
> > we have the following XML snippet:
> >
> > <PRODUCT_DETAILS>
> >   <INTERNATIONAL_PID *type="GTIN"*>4011395534809</INTERNATIONAL_PID>
> > </PRODUCT_DETAILS>
> >
> > We iterate over XML-Nodes like this:
> >
> > <#list product["PRODUCT_DETAILS"]?children as detail>
>
> Note that you can write `product["PRODUCT_DETAILS"]` as
> `product.PRODUCT_DETAILS`.
>

Ok. Usually we do. Not sure why we didn't here. Was a code snippet from a
colleague :)


>
> >   nodeValue: ${detail!}
> > </#list>
> >
> > This code snippet is pretty fast (result appears after e.g. 3 seconds).
> But
> > when we turn it into this and try to add the node attribute "type"...
> >
> > <#list product["PRODUCT_DETAILS"]?children as detail>
> >  <#if detail.*@type*[0]?? && detail.*@type*[0] == "GTIN">
> >   node*Attribute* type: ${detail.@type}
> >   nodeValue: ${detail!}
> >  </#if>
> > </#list>
> >
> > ...it gets very very slow (result appears after > 1 minute). We have a
> large XML with >> 1000 products here.
> >
> > I have digged a little bit in the code and found that when the check for
> > *detail.@type[0]??* is in the mix, then
> > *freemarker.ext.dom.NodeModel.get(String) *is jumping into the
> else-branch
> > which uses XPath:
> >
> >  public TemplateModel get(String key) throws TemplateModelException {
> >
> >    if (key.startsWith("@@")) {
> >     // ....
> >
> > }
> >
> > else{
> >
> > XPathSupport xps = getXPathSupport();
> >           if (xps != null) {
> >
> > *return xps.executeQuery(node, key);*            }
> >
> > }
> >
> > This *xps.executeQuery(node, key); *seems to be the reason why it gets
> > slow. Internally it
> > calls
> > freemarker.ext.dom.SunInternalXalanXPathSupport.executeQuery(Object,
> > String)
> >
> > To work around this problem I have tried creating an own
> > TemplateMethodModel ${nodeAttrib(node, attributename)}
> >
> > where I do something like this (I know it's ugly but experimental...) :
> >
> > new TemplateMethodModelEx(){
> >
> > public Object exec(List arg0) throws TemplateModelException {
> >
> > Object node = arg0.get(0);
> >
> > Object attribute = arg0.get(1);
> >
> > if(node instanceof NodeModel) {
> >
> > NamedNodeMap attributes = ((NodeModel) node).getNode().getAttributes();
> >
> > if(attributes != null && attributes.getLength() > 0) {
> >
> > Node namedItem = *attributes.getNamedItem(attribute.toString())*;
> >
> > if(namedItem != null) {
> >
> > System.out.println(namedItem.getNodeValue());
> >
> > return namedItem.getNodeValue();
> >
> > }
> >
> > else {
> >
> > return "";
> >
> > }
> >
> > }
> >
> > }
> >
> > return "";
> >
> > }
> >
> >
> > When we then use this it is much faster:
> > <#list product["PRODUCT_DETAILS"]?children as detail>
> >   <#if *nodeAttribute(details, "type")!* == "GTIN" >
> >    node*Attribute* type: ${*nodeAttribute(details, "type")!*}
> >    nodeValue: ${detail!}
> >   </#if>
> > </#list>
> >
> > It seems that NamedNodeMap.*getNamedItem() *much faster and does not use
> > XPath. Although it is also iterating internally over the whole array to
> > find the index of the attribute - the overhead seems to be less then
> XPath.
> >
> > My questions are:
> > 1. What is the reason that *detail.@type[0]?? *causes a heavy XPath
> > evaluation under the hood?
> > 2. Are we doing something wrong?
> > 3. Should we go the custom TemplateMethodModel way? Is this approach ok?
> >
> > I hope this was understandable.
> >
> > Thanks
> > Christoph
> >
>
> --
> Thanks,
>  Daniel Dekany
>
>


-- 
Christoph Rüger, Geschäftsführer
Synesty <https://synesty.com/> - Anbinden und Automatisieren ohne
Programmieren - Automatisierung, Schnittstellen, Datenfeeds

Xing: https://www.xing.com/profile/Christoph_Rueger2
LinkedIn: http://www.linkedin.com/pub/christoph-rueger/a/685/198

-- 
Synesty GmbH
Moritz-von-Rohr-Str. 1a
07745 Jena
Tel.: +49 3641 559649
Fax.: +49 3641 5596499
Internet: http://synesty.com

Geschäftsführer: Christoph Rüger
Unternehmenssitz: Jena
Handelsregister B beim Amtsgericht: Jena
Handelsregister-Nummer: HRB 508766
Ust-IdNr.: DE287564982

Re: Slow access to XML node attribute because of XPath in NodeModel.get()?

Posted by Daniel Dekany <dd...@apache.org>.

Actually, in `someNode.something` (which is the same as
`someNode["something"]`) the "something" meant to be XPath (appart
from a few FreeMarker-specific things). But as you have seen in some
very simple cases we solve the XPath query "natively" instead of
delegating it to XPath. Now if profiling reliably(!) shows that it's
indeed XPath in this case that's so slow, then it seems it would be
better if we take over even more cases from XPath, such as
"@something"... I will have to look into that though. (This can be
tricky though, as you replace the implementation of that kind of query
under thousands of already working applications, so it really has to
behave exactly like the real thing earlier. That's unlike with
product.PRODUCT_DETAILS which was never delegated to the real XPath
implementation.)


Thursday, October 12, 2017, 7:35:42 PM, Christoph Rüger wrote:

> Hi,
>
> we have the following XML snippet:
>
> <PRODUCT_DETAILS>
>   <INTERNATIONAL_PID *type="GTIN"*>4011395534809</INTERNATIONAL_PID>
> </PRODUCT_DETAILS>
>
> We iterate over XML-Nodes like this:
>
> <#list product["PRODUCT_DETAILS"]?children as detail>

Note that you can write `product["PRODUCT_DETAILS"]` as
`product.PRODUCT_DETAILS`.

>   nodeValue: ${detail!}
> </#list>
>
> This code snippet is pretty fast (result appears after e.g. 3 seconds). But
> when we turn it into this and try to add the node attribute "type"...
>
> <#list product["PRODUCT_DETAILS"]?children as detail>
>  <#if detail.*@type*[0]?? && detail.*@type*[0] == "GTIN">
>   node*Attribute* type: ${detail.@type}
>   nodeValue: ${detail!}
>  </#if>
> </#list>
>
> ...it gets very very slow (result appears after > 1 minute). We have a
large XML with >> 1000 products here.
>
> I have digged a little bit in the code and found that when the check for
> *detail.@type[0]??* is in the mix, then
> *freemarker.ext.dom.NodeModel.get(String) *is jumping into the else-branch
> which uses XPath:
>
>  public TemplateModel get(String key) throws TemplateModelException {
>
>    if (key.startsWith("@@")) {
>     // ....
>
> }
>
> else{
>
> XPathSupport xps = getXPathSupport();
>           if (xps != null) {
>
> *return xps.executeQuery(node, key);*            }
>
> }
>
> This *xps.executeQuery(node, key); *seems to be the reason why it gets
> slow. Internally it
> calls
> freemarker.ext.dom.SunInternalXalanXPathSupport.executeQuery(Object,
> String)
>
> To work around this problem I have tried creating an own
> TemplateMethodModel ${nodeAttrib(node, attributename)}
>
> where I do something like this (I know it's ugly but experimental...) :
>
> new TemplateMethodModelEx(){
>
> public Object exec(List arg0) throws TemplateModelException {
>
> Object node = arg0.get(0);
>
> Object attribute = arg0.get(1);
>
> if(node instanceof NodeModel) {
>
> NamedNodeMap attributes = ((NodeModel) node).getNode().getAttributes();
>
> if(attributes != null && attributes.getLength() > 0) {
>
> Node namedItem = *attributes.getNamedItem(attribute.toString())*;
>
> if(namedItem != null) {
>
> System.out.println(namedItem.getNodeValue());
>
> return namedItem.getNodeValue();
>
> }
>
> else {
>
> return "";
>
> }
>
> }
>
> }
>
> return "";
>
> }
>
>
> When we then use this it is much faster:
> <#list product["PRODUCT_DETAILS"]?children as detail>
>   <#if *nodeAttribute(details, "type")!* == "GTIN" >
>    node*Attribute* type: ${*nodeAttribute(details, "type")!*}
>    nodeValue: ${detail!}
>   </#if>
> </#list>
>
> It seems that NamedNodeMap.*getNamedItem() *much faster and does not use
> XPath. Although it is also iterating internally over the whole array to
> find the index of the attribute - the overhead seems to be less then XPath.
>
> My questions are:
> 1. What is the reason that *detail.@type[0]?? *causes a heavy XPath
> evaluation under the hood?
> 2. Are we doing something wrong?
> 3. Should we go the custom TemplateMethodModel way? Is this approach ok?
>
> I hope this was understandable.
>
> Thanks
> Christoph
>

-- 
Thanks,
 Daniel Dekany