You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xalan.apache.org by "Rob.Conde" <Ro...@ai-solutions.com> on 2009/11/12 20:59:03 UTC

Xalan performance issue on large files...

Hey David,
	I'm seeing some really bad performance while attempting to transform
a 30 megabyte xml file. As a benchmark, I tried running just the identity
transformation against my file in both .NET 2.0 and in Xalan. .NET took ~3
seconds...I killed the Xalan after 25 minutes. I didn't try digging into the
Xalan code at all, but obviously some algorithm isn't scaling very nicely.

Rob Conde



Re: Xalan performance issue on large files...

Posted by David Bertoni <db...@gmail.com>.
On Thu, Nov 19, 2009 at 2:09 AM, Rob.Conde <Ro...@ai-solutions.com>wrote:

>  Hi David,
>
>             I built the XalanTransformer sample and indeed the performance
> seems fine there. Also I tried passing a stringstream instead of the
> DOMStringPrintWriter and it seems to work fine…though I’m not sure if this
> will handle encoding correctly.
>
As long as you treat the string as a sequence of binary bytes, you won't
have any issues.

The interesting thing about transforming to a XalanDOMString is that you
override the output transcoding mechanism, since XalanDOMString can only
contain UTF-16 code points. This can be problematic if your stylesheet
doesn't specify UTF-16 as the output encoding. To parse the resulting
document, you would need to explicitly force the encoding on the InputStream
by calling InputStream::setEncoding() and passing in "UTF-16".


> So I think you’re probably on the right track. Note that the code I posted
> crashes at the end since terminate is called before the XSLTInputSource goes
> out of scope – but this is unrelated to the original problem.
>
I will investigate the performance issues, but I suspect it has to do with
re-sizing the XalanDOMString instance as it grows. You might try running
your test application and adding a call to XalanDOMString::reserve() before
you transform:

domStr.reserve(35000000);

This would reserve enough space in the instance so resizing would not occur.

Dave

RE: Xalan performance issue on large files...

Posted by "Rob.Conde" <Ro...@ai-solutions.com>.
Hi David,

            I built the XalanTransformer sample and indeed the performance
seems fine there. Also I tried passing a stringstream instead of the
DOMStringPrintWriter and it seems to work fine.though I'm not sure if this
will handle encoding correctly.

 

So I think you're probably on the right track. Note that the code I posted
crashes at the end since terminate is called before the XSLTInputSource goes
out of scope - but this is unrelated to the original problem.

 

Rob

 

  _____  

From: David Bertoni [mailto:dbertoni.apache@gmail.com] 
Sent: Tuesday, November 17, 2009 9:47 PM
To: xalan-c-users@xml.apache.org
Subject: Re: Xalan performance issue on large files...

 

On Tue, Nov 17, 2009 at 6:39 AM, Rob.Conde <Ro...@ai-solutions.com>
wrote:

Hey David,
       I understand if you haven't gotten to it yet, but I wanted to make
sure you got my example file and perhaps confirmed or were unable to confirm
the performance issue with it...

Hi Rob,

I just took an initial look at this.

I built Xalan-C on an Ubuntu 9.10 VMware virtual machine on my laptop, which
is a Lenovo T60p with a Core Duo 2600 (2.16GHz) with 2GB of memory.

I then created a stylesheet that does an identity transformation and ran the
Xalan command line application to transform to a file, reporting the
timings:

dbertoni@ubuntu-vmware:~/apache/test$ Xalan -t -o copy.xml bigXml.txt
identity.xsl 
Source tree parsing time: 1900 milliseconds.
Stylesheet compilation time: 0 milliseconds.
Transformation time: 3620 milliseconds.

So you can see the transformation time is trivial. I plan now to build your
sample code as an application and test it. It may be there is some
inefficiency in transforming to a XalanDOMString through PrintWriter
interface.

Can you please verify that an identity transform in your environment through
the Xalan command  line executable exhibits similar performance to my Ubuntu
virtual machine?

Thanks!

Dave



______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________


Re: Xalan performance issue on large files...

Posted by David Bertoni <db...@gmail.com>.
On Tue, Nov 17, 2009 at 6:39 AM, Rob.Conde <Ro...@ai-solutions.com>wrote:

> Hey David,
>        I understand if you haven't gotten to it yet, but I wanted to make
> sure you got my example file and perhaps confirmed or were unable to
> confirm
> the performance issue with it...
>
Hi Rob,

I just took an initial look at this.

I built Xalan-C on an Ubuntu 9.10 VMware virtual machine on my laptop, which
is a Lenovo T60p with a Core Duo 2600 (2.16GHz) with 2GB of memory.

I then created a stylesheet that does an identity transformation and ran the
Xalan command line application to transform to a file, reporting the
timings:

dbertoni@ubuntu-vmware:~/apache/test$ Xalan -t -o copy.xml bigXml.txt
identity.xsl
Source tree parsing time: 1900 milliseconds.
Stylesheet compilation time: 0 milliseconds.
Transformation time: 3620 milliseconds.

So you can see the transformation time is trivial. I plan now to build your
sample code as an application and test it. It may be there is some
inefficiency in transforming to a XalanDOMString through PrintWriter
interface.

Can you please verify that an identity transform in your environment through
the Xalan command  line executable exhibits similar performance to my Ubuntu
virtual machine?

Thanks!

Dave

RE: Xalan performance issue on large files...

Posted by "Rob.Conde" <Ro...@ai-solutions.com>.
Hey David,
	I understand if you haven't gotten to it yet, but I wanted to make
sure you got my example file and perhaps confirmed or were unable to confirm
the performance issue with it...

Thanks,
Rob Conde



Re: Xalan performance issue on large files...

Posted by David Bertoni <db...@apache.org>.
Rob.Conde wrote:
> Hey David,
> 	I'm seeing some really bad performance while attempting to transform
> a 30 megabyte xml file. As a benchmark, I tried running just the identity
> transformation against my file in both .NET 2.0 and in Xalan. .NET took ~3
> seconds...I killed the Xalan after 25 minutes. I didn't try digging into the
> Xalan code at all, but obviously some algorithm isn't scaling very nicely.
That seems strange, since the identity transformation isn't really 
taxing the processor.  Can you provide the XML file, or an obfuscated 
version of it, if it contains sensitive information?

Dave