You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Toadie <to...@gmail.com> on 2010/04/03 20:25:25 UTC

approximation of memory footprint used by Xalan

Is there a way to approximate the memory footprint needed by Xalan to
run an XSL?

For example, i am seeing that with SAX based transformation
- using Xalan 2.7.0 and java 1.6_u13 with a bootclasspath option to
force the JDK to load Xalan 2.7.0
- an input file of size 180 meg
- and a simple XSL that does identity transformation (see below)

The required memory footprint for heapsize is approximately 950Mb. --
my questions are:

1. is there a way to approximate the required memory footprint?
2. with SAX based processing, why does the 180Mb input file require
such high overhead of heap memory?

_____ XSL ____

<?xml version='1.0' encoding='UTF-8'?>
<xsl:transform  xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>

    <xsl:template match="/">
	<xsl:apply-templates select="*"/>
    </xsl:template>

    <xsl:template match="*">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="@*">
        <xsl:copy>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>

</xsl:transform>

RE: problem in disable-output-escaping

Posted by dipesh <di...@erevmax.com>.
Thanks Michael Ludwig,

It is working

I have used encoding ="ISO 8859"

Dipesh Garg

-----Original Message-----
From: Michael Ludwig [mailto:milu71@gmx.de] 
Sent: Monday, April 05, 2010 4:47 PM
To: xalan-j-users@xml.apache.org
Subject: Re: problem in disable-output-escaping

dipesh schrieb am 05.04.2010 um 11:09:08 (+0530):

> Now I want that output xml shold be like this

> <CityName>&#246;&#233;&#217;&#210;</CityName>

> But it gives output like this

>                                         <CityName>öéÙÒ</CityName>

Use: <xsl:output encoding="us-ascii"/>

-- 
Michael Ludwig

DISCLAIMER
This email message and any accompanying attachments may contain confidential information.
If you are not the intended recipient, do not read, use, disseminate, distribute or copy 
this message or attachments. If you have received this message in error, please notify the
sender immediately and delete this message. Any views expressed in this message are those 
of the individual sender, except where the sender expressly, and with authority, states 
them to be the views of eRevMax Technologies, Inc. Before opening any attachments, please
check them for viruses and defects.

Re: problem in disable-output-escaping

Posted by Michael Ludwig <mi...@gmx.de>.
dipesh schrieb am 05.04.2010 um 11:09:08 (+0530):

> Now I want that output xml shold be like this

> <CityName>&#246;&#233;&#217;&#210;</CityName>

> But it gives output like this

>                                         <CityName>öéÙÒ</CityName>

Use: <xsl:output encoding="us-ascii"/>

-- 
Michael Ludwig

problem in disable-output-escaping

Posted by dipesh <di...@erevmax.com>.
Hi All,

 

I used XSLT for transforming one xml to another xml.

 

Input XML contain some ISO 8859 value like 

 

                                    <Address Type='1'>

                                        <AddressLine></AddressLine>

 
<CityName>&#246;&#233;&#217;&#210;</CityName>


                                    </Address>

 

I am using xslt which looks like this 

 

                  <?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

<Address>

                       <xsl:attribute name="Type" namespace="">

                             <xsl:value-of
select="string($var32_ProfileInfo/ns0:Profile/ns0:Customer/ns0:Address/@Type
)"/>

                       </xsl:attribute>                      

                           <AddressLine>

                               <xsl:value-of
select="string(ns0:Customer/ns0:Address/ns0:AddressLine)"/>

                           </AddressLine>

                           <xsl:text disable-output-escaping="no"
>"&#246;&#233;&#217;&#210;"</xsl:text>

                           <CityName>

                             <xsl:value-of
select="$var32_ProfileInfo/ns0:Profile/ns0:Customer/ns0:Address/ns0:CityName
" disable-output-escaping="yes"/>

                           </CityName>

                    </Address>

                  </xsl:stylesheet>

 

Now I want that output xml shold be like this

 

                  <Address Type='1'>

                                        <AddressLine></AddressLine>

                                         &#246;&#233;&#217;&#210;

 
<CityName>&#246;&#233;&#217;&#210;</CityName>


                                    </Address>

 

But it gives output like this

 

                                   <Address Type='1'>

                                        <AddressLine></AddressLine>

                                         öéÙÒ

                                        <CityName>öéÙÒ</CityName>


                                    </Address>

 

 

Can anybody can tell me how I can solve this problem

 

 


DISCLAIMER
This email message and any accompanying attachments may contain confidential information.
If you are not the intended recipient, do not read, use, disseminate, distribute or copy 
this message or attachments. If you have received this message in error, please notify the
sender immediately and delete this message. Any views expressed in this message are those 
of the individual sender, except where the sender expressly, and with authority, states 
them to be the views of eRevMax Technologies, Inc. Before opening any attachments, please
check them for viruses and defects.

Re: approximation of memory footprint used by Xalan

Posted by ke...@us.ibm.com.
> Sorry.  I didn't mean object as in Java object.  My initial thought
> was that 5 SubAllocatedIntVector objects were used per node and inside
> each SubAllocatedIntVector there is a int[][].   If the 5 objects
> always grow in synchronization and have the same column-width, would
> it be possible to use a  single (new) variant of SubAllocatedIntVector

Unfortunately I don't think there's any good opportunity there. While 
locality of reference might be improved, addressing overhead would also be 
increased.

> After looking through the Xalan code again, I don't think those 5
> objects mentioned earlier grow at the same rate and so my suggestions
> would not work .

There's a core set which are maintained in parallel (the basic document 
tree structure), then others which are used as needed to support the data 
content of the document; the latter are pointed to from the core set 
rather than being based from the same node index.

Re: approximation of memory footprint used by Xalan

Posted by Toadie <to...@gmail.com>.
Hi Keshlam

Sorry.  I didn't mean object as in Java object.  My initial thought
was that 5 SubAllocatedIntVector objects were used per node and inside
each SubAllocatedIntVector there is a int[][].   If the 5 objects
always grow in synchronization and have the same column-width, would
it be possible to use a  single (new) variant of SubAllocatedIntVector
that maintains a

  int[] [] [] [] [] [] m_map ;   // can this hold the mapping
information for all 5 objects all at once using a single index?

instead of

  int [] [] m_map ;

After looking through the Xalan code again, I don't think those 5
objects mentioned earlier grow at the same rate and so my suggestions
would not work .

Thanks for your clarification.


On Tue, Apr 6, 2010 at 5:49 AM,  <ke...@us.ibm.com> wrote:
> Let's keep this on the mailing list. That way everyone can participate.
>
> The simple answer is no. Remember that minimal space required by a Java
> object is more than 32 bytes, even before it carries any real data. An
> object per node is MUCH more expensive than the parallel array solution. If
> Java had structs and unions, as C does, we might be able to do better... but
> it really doesn't. (There are some low-level memory management routines
> added late in Java's evolution, but I didn't have time to explore that
> before I ran out of available time.)
>
> So: We've done a lot of work to keep array access as inexpensive as
> possible, but it's a trade-off, as always.
>
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
> (http://www.ovff.org/pegasus/songs/threes-rev-11.html)

Re: approximation of memory footprint used by Xalan

Posted by ke...@us.ibm.com.
This is the DTM (Document Table Model) representation of the input 
document tree. It's a huge improvement over DOM-style one-object-per-node, 
since Java objects take something on the order of 32 bytes just for 
bookkeeping before you add any data fields to them. But, yes, maintaining 
that tree (plus the fact that Java uses UTF16 internally, so each 
character of text content takes two bytes) does add up.

Before Xalan -- back when it was IBM's LotusXSL processor -- we had an 
ultra-compact variant of DTM which reduced node size down to just 16 
bytes. However, that version imposed some serious performance penalties. 
The current DTM is a compromise between memory usage and access speed.

In general, XSLT can access any portion of the input document at any time, 
and has the concept of "node identity" which must be maintained across 
those accesses, and thus a full in-memory model of the document is 
required. There are some potential opportunities for reducing that, at 
least for a subset of XSLT; there was some discussion in the archives of 
this mailing list of a few possible approaches.


______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)



From:
Toadie <to...@gmail.com>
To:
xalan-j-users@xml.apache.org
Date:
04/03/2010 10:44 PM
Subject:
Re: approximation of memory footprint used by Xalan



I did a bit more profiling and found that the majority of the mem
allocation is in org.apache.xml.utils.SuballocatedIntVector called by
SAX2DTM in the startElement method.  The major of the mem allocation
inside SuballocatedIntVector is in a pair of int[][] m_map and int[][]
m_map0

The profiler showed that
- 178,135 instances of int array were allocated and used up 458Mb
- 56,009 instances of char[] were allocated and used up 106Mb

it seems that for each element/node that is read and output by the
SAX2DTM class, it add 1 integer into at least 6-8 instances of the
SuballocatedIntVector object

"m_firstch"
"m_nextsib"
"m_parent"
"m_exptype"
"m_dataOrQName"
"m_prevsib"
m_data
m_value

wow -- that was a surprise there.  My Xml input file has a lilttle
over 12.8 million xml element .  A quick calculation (rough) show

12,800,000 * 32 bytes / 1024 / 1024 ~= 390 Mb.

I wonder if there is an opportunity to tune/tweak the memory mgmt in
that class or not or whether or not the array has to be kept from
start-to-end of the input file for traversal purposes.

Thanks in advance

On Sat, Apr 3, 2010 at 11:25 AM, Toadie <to...@gmail.com> wrote:
> Is there a way to approximate the memory footprint needed by Xalan to
> run an XSL?
>
> For example, i am seeing that with SAX based transformation
> - using Xalan 2.7.0 and java 1.6_u13 with a bootclasspath option to
> force the JDK to load Xalan 2.7.0
> - an input file of size 180 meg
> - and a simple XSL that does identity transformation (see below)
>
> The required memory footprint for heapsize is approximately 950Mb. --
> my questions are:
>
> 1. is there a way to approximate the required memory footprint?
> 2. with SAX based processing, why does the 180Mb input file require
> such high overhead of heap memory?
>
> _____ XSL ____
>
> <?xml version='1.0' encoding='UTF-8'?>
> <xsl:transform  xmlns:xsl='http://www.w3.org/1999/XSL/Transform' 
version='1.0'>
>
>    <xsl:template match="/">
>        <xsl:apply-templates select="*"/>
>    </xsl:template>
>
>    <xsl:template match="*">
>        <xsl:copy>
>            <xsl:apply-templates select="@* | node()"/>
>        </xsl:copy>
>    </xsl:template>
>
>    <xsl:template match="@*">
>        <xsl:copy>
>            <xsl:apply-templates/>
>        </xsl:copy>
>    </xsl:template>
>
> </xsl:transform>
>


Re: approximation of memory footprint used by Xalan

Posted by Toadie <to...@gmail.com>.
I did a bit more profiling and found that the majority of the mem
allocation is in org.apache.xml.utils.SuballocatedIntVector called by
SAX2DTM in the startElement method.  The major of the mem allocation
inside SuballocatedIntVector is in a pair of int[][] m_map and int[][]
m_map0

The profiler showed that
- 178,135 instances of int array were allocated and used up 458Mb
- 56,009 instances of char[] were allocated and used up 106Mb

it seems that for each element/node that is read and output by the
SAX2DTM class, it add 1 integer into at least 6-8 instances of the
SuballocatedIntVector object

"m_firstch"
"m_nextsib"
"m_parent"
"m_exptype"
"m_dataOrQName"
"m_prevsib"
m_data
m_value

wow -- that was a surprise there.  My Xml input file has a lilttle
over 12.8 million xml element .  A quick calculation (rough) show

12,800,000 * 32 bytes / 1024 / 1024 ~= 390 Mb.

I wonder if there is an opportunity to tune/tweak the memory mgmt in
that class or not or whether or not the array has to be kept from
start-to-end of the input file for traversal purposes.

Thanks in advance

On Sat, Apr 3, 2010 at 11:25 AM, Toadie <to...@gmail.com> wrote:
> Is there a way to approximate the memory footprint needed by Xalan to
> run an XSL?
>
> For example, i am seeing that with SAX based transformation
> - using Xalan 2.7.0 and java 1.6_u13 with a bootclasspath option to
> force the JDK to load Xalan 2.7.0
> - an input file of size 180 meg
> - and a simple XSL that does identity transformation (see below)
>
> The required memory footprint for heapsize is approximately 950Mb. --
> my questions are:
>
> 1. is there a way to approximate the required memory footprint?
> 2. with SAX based processing, why does the 180Mb input file require
> such high overhead of heap memory?
>
> _____ XSL ____
>
> <?xml version='1.0' encoding='UTF-8'?>
> <xsl:transform  xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
>
>    <xsl:template match="/">
>        <xsl:apply-templates select="*"/>
>    </xsl:template>
>
>    <xsl:template match="*">
>        <xsl:copy>
>            <xsl:apply-templates select="@* | node()"/>
>        </xsl:copy>
>    </xsl:template>
>
>    <xsl:template match="@*">
>        <xsl:copy>
>            <xsl:apply-templates/>
>        </xsl:copy>
>    </xsl:template>
>
> </xsl:transform>
>

Re: approximation of memory footprint used by Xalan

Posted by Michael Ludwig <mi...@gmx.de>.
Toadie schrieb am 03.04.2010 um 11:25:25 (-0700):
> Is there a way to approximate the memory footprint needed by Xalan to
> run an XSL?
> 
> For example, i am seeing that with SAX based transformation
> - using Xalan 2.7.0 and java 1.6_u13 with a bootclasspath option to
> force the JDK to load Xalan 2.7.0
> - an input file of size 180 meg
> - and a simple XSL that does identity transformation (see below)
> 
> The required memory footprint for heapsize is approximately 950Mb. --
> my questions are:
> 
> 1. is there a way to approximate the required memory footprint?

I usually reckon with ten times the input source, but in your example
it's only five times. Depends on input size and number of nodes, I
think. You could do some experiments with various input document sizes
and structures (few large text nodes, or many small nodes) to get a
feeling for how much memory is required.

> 2. with SAX based processing, why does the 180Mb input file require
> such high overhead of heap memory?

SAX here is just the method to build the input tree in memory. (It could
also be built from DOM, which would make sense if you already had a DOM
in your application.)

XSLT allows random access to the input, so if it were not entirely in
memory it would be much slower.

You might happen to have a transformation that could be done using
streaming. The commercial version of the Saxon processor allows you to
do that.

http://www.saxonica.com/documentation/sourcedocs/serial.html

Happy Easter,
-- 
Michael Ludwig