You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Aliger Martin <XA...@cs.felk.cvut.cz> on 2000/03/26 12:07:08 UTC
processing large XML
Hi,
I try to use Xalan for XML->XML transformation. It's good for small
inputs, but I also need handle larger ones (up to 500MB) even in poor
machines (12MB+windowze)...
Problem:
I try this: run java Xalan 0.20 on my Linux and SUN JDK 1.2.2
Program: SimpleTransform,SAX,DOMParser+SimpleTransform
Inputs: standard XSL, no problem
XML (2kB,20kB,1MB,20MB), content is still simillar
(only <a>1</a> and <b>2</b>
Result: on smaller ones is everything fine
on 20MB JVM produces SEGMENTATION FAULT
Is this error in JDK or in Xalan? (I hope it isn't my program :-)
Could somebody could help ?
With regards
Martin Aliger
PS: I place everything somewhere in http://cs.felk.cvut.cz/~xaliger
Re: processing large XML
Posted by Aliger Martin <XA...@cs.felk.cvut.cz>.
> Yes, the problem is: a template may select any node of the whole document
> (e.g. <xsl:apply-templates select="//[@attr=$var]"/> A Mathematician may find
> a rule to recognize one-pass-stylesheets, but as far as I know all XSL
> processors fetch the whole document into memory.
>
> Saxon seems to be the most 'sophisticated' free processor. The preview tag
> tells the processor, you don't select the children of another preview element.
> But I am afraid, if you can't split the document, the preview tag will not
> help you :-(.
>
> The summary - if you implement a one pass processor, you will become
> the hero of heroes!
>
> We are looking forward to see your xalan one-pass extentions :-).
:)))))))))) Thanks. I try to handle this problem. Somehow. I have to. But
don;t know if XSL processor is the good way.
Bye - I try something. And if it works - maybe I could commit it as xalan
extension :-). I havn't much practice in commiting to free-projects, but
this could be good opportunity, couldn't it? :-)))
I know that is impossible to handle everything by one-pass-processor. But
some "reasonable" subset ... For example only such XSLs wich have
templates only on form with apply-templates select="name". And no
reordering, of course.
It;s pretty late (2:20) - go to bed. morning ....
MSK ALIK
Re: processing large XML
Posted by Edwin Glaser <ed...@pannenleiter.de>.
Hello,
You wrote:
> Is realy neccessary to have every node in memory? I dont think so. I will
> be happy, if some limited transformations (no reordering) could be done
> with one seqential pass. But I still need some advanced features on
> smaller files (sorting,reordering,...). And testing where is small/large
> boundery is quite difficult - and you need 2 program branches...
Yes, the problem is: a template may select any node of the whole document
(e.g. <xsl:apply-templates select="//[@attr=$var]"/> A Mathematician may find
a rule to recognize one-pass-stylesheets, but as far as I know all XSL
processors fetch the whole document into memory.
Saxon seems to be the most 'sophisticated' free processor. The preview tag
tells the processor, you don't select the children of another preview element.
But I am afraid, if you can't split the document, the preview tag will not
help you :-(.
The summary - if you implement a one pass processor, you will become
the hero of heroes!
We are looking forward to see your xalan one-pass extentions :-).
--
Edwin Glaser -- edwin@pannenleiter.de
Re: processing large XML
Posted by Aliger Martin <XA...@cs.felk.cvut.cz>.
> Hello,
Hello,
>
> You wrote:
> > I try to use Xalan for XML->XML transformation. It's good for small
> > inputs, but I also need handle larger ones (up to 500MB) even in poor
> > machines (12MB+windowze)...
>
> Do you really need one big file ? Is it possible to split the file
> into 100 chunks ? You can prepare a StylesheetRoot object and apply
> it 100 times.
:-((. This isn't possible :-(. It is generated XML and I know absolutly
nothing about structure and so on...
> The saxon processor (http://users.iclway.co.uk/mhkay/saxon/) might
> be another solution for your problem.
> - The saxon:preview element is a top-level element used to identify
> elements that will be processed in preview mode. The purpose of
> preview mode is to enable XSL processing of very large documents
> that are too big to fit in memory: the idea is that subtrees of
> the document can be processed and then discarded as soon as they
> are encountered.
Thank you - I try this. I'm new on this XML field (for two weeks?) and
still looking for good solution...
> Just another thought: Can you accept the performance of xsl
> transformations ? If you use a raw sax parser and hard coded
> transformations, your program will run 10 to 100 times faster.
Yep. You are right. Maybe the XSL performance will not be acceptable. I'm
still testing. Yes - I use SAX (yesterday I found it - and was happy :-),
but writing my own processor is hard job. I was coding one for three
months already - and is very limited (compared with xalan).
Is realy neccessary to have every node in memory? I dont think so. I will
be happy, if some limited transformations (no reordering) could be done
with one seqential pass. But I still need some advanced features on
smaller files (sorting,reordering,...). And testing where is small/large
boundery is quite difficult - and you need 2 program branches...
Thanks
Martin Aliger
Re: processing large XML
Posted by Edwin Glaser <ed...@pannenleiter.de>.
Hello,
You wrote:
> I try to use Xalan for XML->XML transformation. It's good for small
> inputs, but I also need handle larger ones (up to 500MB) even in poor
> machines (12MB+windowze)...
Do you really need one big file ? Is it possible to split the file
into 100 chunks ? You can prepare a StylesheetRoot object and apply
it 100 times.
The saxon processor (http://users.iclway.co.uk/mhkay/saxon/) might
be another solution for your problem.
- The saxon:preview element is a top-level element used to identify
elements that will be processed in preview mode. The purpose of
preview mode is to enable XSL processing of very large documents
that are too big to fit in memory: the idea is that subtrees of
the document can be processed and then discarded as soon as they
are encountered.
Just another thought: Can you accept the performance of xsl
transformations ? If you use a raw sax parser and hard coded
transformations, your program will run 10 to 100 times faster.
Hope it helps. edwin.
--
Edwin Glaser -- edwin@pannenleiter.de