You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Thomas Porschberg <th...@osp-dd.de> on 2006/04/19 13:17:16 UTC

memory usage of xslt processing

Hi,

I have the following task:
Create an arbitrary formatted file (XML/HTML/CSV whatever) based on a
Select from a database.

As a constraint the amount of data fetched from the database can not
be stored in memory as a whole.
Another constraint is that I can not use XML-functionality in the
database, I have to implement the functionality on top of our database
access framework. This database access framework fetches record for
record one after another.

My idea was to decorate every fetched row from the database with simple
generic XML and fire this to Xalan.

Let do an example:
If my result set from the database looks like:

ID  Name  Description
--  ----  -----------
1  "dog"  "an animal may be dangerous"
2  "cat"  "an animal likes milk"

I create the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
 <row>
  <value>1</value>
  <value>dog</value>
  <value>an animal may be dangerous</value>
 </row>
 <row>
  <value>2</value>
  <value>cat</value>
  <value>an animal likes milk</value>
 </row>
</dataset>

I create this XML as "Sax fire events" in an java
class[StringArrayXMLReader], which implements the org.xml.sax.XMLReader
interface.
I have three methods:

public void init() throws SAXException {
        ch.startDocument(  );
        ch.startElement("","dataset","dataset",EMPTY_ATTR);
}

public void close() throws SAXException {
        ch.endElement("","dataset","dataset");
        ch.endDocument(  );
}

public void parse(String [] input) throws SAXException {
        ch.startElement("","row","row",EMPTY_ATTR);
        for (int i = 0; i< input.length; ++i){
           ch.startElement("","value","value",EMPTY_ATTR);
           ch.characters(input[i].toCharArray(), 0,input[i].length(  ));
           ch.endElement("","value","value");
       }
       ch.endElement("","row","row");
}

The parse method creates the <row>...</row> entries for an overhanded
String array.
The StringArrayXMLReader is associated with a TransformerHandler, which
uses a XSL stylesheet to transform the XML to the desired output.

What happens here is, that when the fetch from the database starts I
call init() ( and thus startDocument() ) and at last, after the fetch
finished, I call close() (and thus endDocument()).
I observed that the xslt processing starts when endDocument() is called.
This is not acceptable for me because I fear the xslt processor reads
all the rows into memory until endDocument() is called and in this case
I take a risk to run in OutOfMemory.

My second idea was to eliminate the init()/close() methods and to
consider one <row>...</row> section as complete document input for the
processor. This has the disadvantage that I have to create the head and
tail of the document manually (and in my example I get a
NullPointerException when I the transformer is called twice).

I have the following questions:
Is it possible to create the output without having the whole data in
memory ?
The basis XML for xslt processing 
<dataset>
  <row><value>...
  <row><value>...
</dataset>
looks very simple and the supplied XLS stylesheets will be not complex
so my hope is to get it working.
I also think that the task in general - produce formatted output from a
potential very large data pool - should be a common one.
Unfortunately I did not do much xslt-processing in the past so I lack
the experience (a bit libxslt which I feed a DOM tree). 
If someone has some striking links I would very glad to
hear. My test code I provide at:

http://randspringer.de/sax_row.tar and
http://randspringer.de/sax.tar

If someone could have a look at it I would really appreciate it.

Thomas

-- 


Re: memory usage of xslt processing

Posted by Thomas Porschberg <th...@osp-dd.de>.
Am Wed, 19 Apr 2006 12:15:06 -0700
schrieb John Gentilin <jo...@gentilin.org>:

> Thomas,
> 
> Have you looked into the SQL extensions of Xalan... In streaming
> mode, the query
> will never consume any more memory than it takes to represent 1 row
> and it creates
> a virtual DOM that is filled in as you navigate the data with XPath 
> statements..
> 
No, until now I did not know about SQL extensions of Xalan and
streaming mode.
It sounds in my ears as it does the same as STX.
I will also have a look at http://joost.sourceforge.net/
which aims to be a reference implementation for STX.

Thomas



> Regards
> John G
> 
> Thomas Porschberg wrote:
> > Hi,
> >
> > I have the following task:
> > Create an arbitrary formatted file (XML/HTML/CSV whatever) based on
> > a Select from a database.
> >
> > As a constraint the amount of data fetched from the database can not
> > be stored in memory as a whole.
> > Another constraint is that I can not use XML-functionality in the
> > database, I have to implement the functionality on top of our
> > database access framework. This database access framework fetches
> > record for record one after another.
> >
> > My idea was to decorate every fetched row from the database with
> > simple generic XML and fire this to Xalan.
> >
> > Let do an example:
> > If my result set from the database looks like:
> >
> > ID  Name  Description
> > --  ----  -----------
> > 1  "dog"  "an animal may be dangerous"
> > 2  "cat"  "an animal likes milk"
> >
> > I create the following XML:
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <dataset>
> >  <row>
> >   <value>1</value>
> >   <value>dog</value>
> >   <value>an animal may be dangerous</value>
> >  </row>
> >  <row>
> >   <value>2</value>
> >   <value>cat</value>
> >   <value>an animal likes milk</value>
> >  </row>
> > </dataset>
> >
> > I create this XML as "Sax fire events" in an java
> > class[StringArrayXMLReader], which implements the
> > org.xml.sax.XMLReader interface.
> > I have three methods:
> >
> > public void init() throws SAXException {
> >         ch.startDocument(  );
> >         ch.startElement("","dataset","dataset",EMPTY_ATTR);
> > }
> >
> > public void close() throws SAXException {
> >         ch.endElement("","dataset","dataset");
> >         ch.endDocument(  );
> > }
> >
> > public void parse(String [] input) throws SAXException {
> >         ch.startElement("","row","row",EMPTY_ATTR);
> >         for (int i = 0; i< input.length; ++i){
> >            ch.startElement("","value","value",EMPTY_ATTR);
> >            ch.characters(input[i].toCharArray(),
> > 0,input[i].length(  )); ch.endElement("","value","value");
> >        }
> >        ch.endElement("","row","row");
> > }
> >
> > The parse method creates the <row>...</row> entries for an
> > overhanded String array.
> > The StringArrayXMLReader is associated with a TransformerHandler,
> > which uses a XSL stylesheet to transform the XML to the desired
> > output.
> >
> > What happens here is, that when the fetch from the database starts I
> > call init() ( and thus startDocument() ) and at last, after the
> > fetch finished, I call close() (and thus endDocument()).
> > I observed that the xslt processing starts when endDocument() is
> > called. This is not acceptable for me because I fear the xslt
> > processor reads all the rows into memory until endDocument() is
> > called and in this case I take a risk to run in OutOfMemory.
> >
> > My second idea was to eliminate the init()/close() methods and to
> > consider one <row>...</row> section as complete document input for
> > the processor. This has the disadvantage that I have to create the
> > head and tail of the document manually (and in my example I get a
> > NullPointerException when I the transformer is called twice).
> >
> > I have the following questions:
> > Is it possible to create the output without having the whole data in
> > memory ?
> > The basis XML for xslt processing 
> > <dataset>
> >   <row><value>...
> >   <row><value>...
> > </dataset>
> > looks very simple and the supplied XLS stylesheets will be not
> > complex so my hope is to get it working.
> > I also think that the task in general - produce formatted output
> > from a potential very large data pool - should be a common one.
> > Unfortunately I did not do much xslt-processing in the past so I
> > lack the experience (a bit libxslt which I feed a DOM tree). 
> > If someone has some striking links I would very glad to
> > hear. My test code I provide at:
> >
> > http://randspringer.de/sax_row.tar and
> > http://randspringer.de/sax.tar
> >
> > If someone could have a look at it I would really appreciate it.
> >
> > Thomas
> >
> >   
> 
> 
> 


--