You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@xalan.apache.org by Sebastian Leske <Le...@ion.ag> on 2011/06/17 10:23:25 UTC

What does popRTFContext() do?

Hi,

an internal project of ours uses Xalan to parse and convert XML files to 
reports in PDF format. We are having some trouble getting Xalan to do 
what we want, so I'd like to solicit advice.

First, some background (sorry for so much text, I tried to make it brief):

The reports we produce have some very peculiar rules for pagination 
(placing of page breaks): Sometimes certain pairs of pages need to face 
each other, so empty pages must be inserted according to special rules, 
and sometimes paragraphs must be reordered to achieve this.

Therefore, use custom Java code that "plugs into" Xalan (by using a 
special XSL file that calls into our classes). This code implements 
custom pagination rules.

Our problem when implementing this was that we found no direct support 
in XSL / Xalan to *reorder* parts of the document (so the order in the 
final PDF differs from the order in the XML input file). So we 
implemented a PageBreaker class (called by pagebreaker.xsl), which 
maintains an internal queue of document fragments. In practice the 
pagebreaker.xsl contains templates:

* start-part: Starts a new queue (called "part")
* write-unit: puts an XML fragment (usually a paragraph) into the 
internal "queue"
* finish-part: Writes out the accumulated queue, reordering as needed.

These templates can be used from an XSL file to produce reports with the 
special pagination.

Internally, this queuing of document fragments is implemented (in 
write-unit) by having setter methods in our code that accept an 
org.w3c.dom.NodeList. When this code is used, Xalan passes in a document 
fragment as a NodeList. The NodeList is queued internally, document 
processing continues without producing output, and then later when 
"finish-part" is called, the whole (reordered) queue is retrieved and 
returned to Xalan for rendering.

*Our problem:*

Sometimes, when we retrieve the NodeList instances (more precisely, the 
org.w3c.dom.Node instances inside), their contents have unexplicably 
changed. It's as if someone changed the queued instances behind our back 
between the time we put it in and the time we take it back out of the queue.

After some digging through Xalan I found out that the NodeList and Node 
instances returned by Xalan are just facades for Xalan's internal data 
structures. Apparently the actual document data is stored in internal 
lists inside Xalan, and the Node instances just point there.

We found out that uncommenting the line

  xctxt.popRTFContext();

in org.apache.xalan.templates.ElemTemplate.execute(TransformerImpl)
made our problems go away.

We believe that popRTFContext() does some internal clean-up which 
invalidates the data we queue.

My question:

Can anyone explain more about the internal workings of Xalan?
Is our analysis correct?
Are there any problems with disabling the above line?
Are there any other pitfalls to be expected, or can we just continue 
working without popRTFContext()?

Greetings,

Sebastian Leske

P.S. Sorry for not posting actual code. It's proprietary, and also a 
rather huge package, and I'm not sure how to simplify it, as the problem 
only occurs with large documents.
I'll try to answer any specific questions by looking into the original code.

-- 
Sebastian Leske
System- und Anwendungsentwicklung
Tel: 0211/92495-146

IOn AG
http://www.ion.ag/
Vorstand: Rudolf Franke, Erik Rehrmann, Manfred Siller
Aufsichtsratsvorsitzender: Reinhard Möntmann
Sitz der Gesellschaft: Erkrath
Amtsgericht Wuppertal: HRB 14181
USt Id-Nr.: DE 121642062


---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org

Re: What does popRTFContext() do?

Posted by ke...@us.ibm.com.

> I would not use XML and XSLT for PDF document pagination.
> I instead use XSLT to convert XML into a LaTex markup language and then
> let the underlying troff/groff processors generate the actual content 
for
> printing and publishing.

For what it's worth, something very similar to that was the original 
intent for XSL -- use XSL Transformation (XSLT) to render the document's 
content into the XSL Formatting Objects (XSL-FO) language, and then use an 
XSL-FO processor to produce the actual on-screen or printed 
representation.

Apache does have an XSL-FO implementation which can produce PDF output, 
though it has been years since I tried using it:
http://xmlgraphics.apache.org/fop/


______________________________________
"You build world of steel and stone
I build worlds of words alone
Skilled tradespeople, long years taught:
You shape matter; I shape thought."
(http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)



From:
shathawa@e-z.net
To:
xalan-dev@xml.apache.org
Date:
06/17/2011 12:57 PM
Subject:
Re: What does popRTFContext() do?



Sebastian Leske,

I would not use XML and XSLT for PDF document pagination.

I instead use XSLT to convert XML into a LaTex markup language and then
let the underlying troff/groff processors generate the actual content for
printing and publishing.  The XSLT stylesheets are customized to the
printing requirements for books, often using selected subsets of the
DOCBOOK XML standard.  Therefore most XSLT processors suffice for my work.

The XSLT-to-LaTex stylesheets accommodate frontisepiece, copyright pages,
dedications, forewords, colophons, tables of contents, tables of tables,
tables of figures, sections, chapters, appendices, multi-page tables,
indexes, glossaries, footnotes, endnotes, references, and a wide array of
multi-pass document creation requirements including pagination,
hyphenation, computed headers, computed footers, floating illustrations,
and mathematical formulae creation.

I don't have a publicly releasable set of XSLT transformations, but this
is what the transformations accomplish using some structured XML as a
source.

- Steven J. Hathaway

> Hi,
>
> an internal project of ours uses Xalan to parse and convert XML files to
> reports in PDF format. We are having some trouble getting Xalan to do
> what we want, so I'd like to solicit advice.
>
> First, some background (sorry for so much text, I tried to make it 
brief):
>
> The reports we produce have some very peculiar rules for pagination
> (placing of page breaks): Sometimes certain pairs of pages need to face
> each other, so empty pages must be inserted according to special rules,
> and sometimes paragraphs must be reordered to achieve this.
>
...
> Can anyone explain more about the internal workings of Xalan?
> Is our analysis correct?
> Are there any problems with disabling the above line?
> Are there any other pitfalls to be expected, or can we just continue
> working without popRTFContext()?
>
> Greetings,
>
> Sebastian Leske
>
> P.S. Sorry for not posting actual code. It's proprietary, and also a
> rather huge package, and I'm not sure how to simplify it, as the problem
> only occurs with large documents.
> I'll try to answer any specific questions by looking into the original
> code.
>
> --
> Sebastian Leske
> System- und Anwendungsentwicklung
> Tel: 0211/92495-146
>
> IOn AG
> http://www.ion.ag/
> Vorstand: Rudolf Franke, Erik Rehrmann, Manfred Siller
> Aufsichtsratsvorsitzender: Reinhard Möntmann
> Sitz der Gesellschaft: Erkrath
> Amtsgericht Wuppertal: HRB 14181
> USt Id-Nr.: DE 121642062
>
>
> ---------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org

Re: What does popRTFContext() do?

Posted by Sebastian Leske <Le...@ion.ag>.

Hello Steven,

On 17.06.2011 19:05, shathawa@e-z.net wrote:
> Sebastian Leske,
>
> I would not use XML and XSLT for PDF document pagination.
>
> I instead use XSLT to convert XML into a LaTex markup language and then
> let the underlying troff/groff processors generate the actual content for
> printing and publishing.  The XSLT stylesheets are customized to the
> printing requirements for books, often using selected subsets of the
> DOCBOOK XML standard.  Therefore most XSLT processors suffice for my work.
[...]

thanks for your input.

However, the problem is that we have a working implementation based on 
Xalan + custom XSLT + custom pagination code, which only started showing 
problems after migrating to a new Xalan release (and a new JDK).

Rewriting everything would have been a major effort, so if we can keep 
the system working, even using a somewhat dubious workaround like 
disabling RTFDTM cleanup, we are already happy.

I just wanted to make sure we understand the consequences of this 
workaround, so we don't run into problems later on. Workarounds can be a 
very efficient way of solving problems, as long as you are aware of 
their limitations :-).

So I think we will work with this right now. If we later run into memory 
problems, we can still reconsider the solution, but right now there does 
not appear to be a problem.

Greetings,

S.Leske

-- 
Sebastian Leske
System- und Anwendungsentwicklung
Tel: 0211/92495-146

IOn AG
http://www.ion.ag/
Vorstand: Rudolf Franke, Erik Rehrmann, Manfred Siller
Aufsichtsratsvorsitzender: Reinhard Möntmann
Sitz der Gesellschaft: Erkrath
Amtsgericht Wuppertal: HRB 14181
USt Id-Nr.: DE 121642062

---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org

Re: What does popRTFContext() do?

Posted by sh...@e-z.net.

Sebastian Leske,

I would not use XML and XSLT for PDF document pagination.

I instead use XSLT to convert XML into a LaTex markup language and then
let the underlying troff/groff processors generate the actual content for
printing and publishing.  The XSLT stylesheets are customized to the
printing requirements for books, often using selected subsets of the
DOCBOOK XML standard.  Therefore most XSLT processors suffice for my work.

The XSLT-to-LaTex stylesheets accommodate frontisepiece, copyright pages,
dedications, forewords, colophons, tables of contents, tables of tables,
tables of figures, sections, chapters, appendices, multi-page tables,
indexes, glossaries, footnotes, endnotes, references, and a wide array of
multi-pass document creation requirements including pagination,
hyphenation, computed headers, computed footers, floating illustrations,
and mathematical formulae creation.

I don't have a publicly releasable set of XSLT transformations, but this
is what the transformations accomplish using some structured XML as a
source.

- Steven J. Hathaway

> Hi,
>
> an internal project of ours uses Xalan to parse and convert XML files to
> reports in PDF format. We are having some trouble getting Xalan to do
> what we want, so I'd like to solicit advice.
>
> First, some background (sorry for so much text, I tried to make it brief):
>
> The reports we produce have some very peculiar rules for pagination
> (placing of page breaks): Sometimes certain pairs of pages need to face
> each other, so empty pages must be inserted according to special rules,
> and sometimes paragraphs must be reordered to achieve this.
>
...
> Can anyone explain more about the internal workings of Xalan?
> Is our analysis correct?
> Are there any problems with disabling the above line?
> Are there any other pitfalls to be expected, or can we just continue
> working without popRTFContext()?
>
> Greetings,
>
> Sebastian Leske
>
> P.S. Sorry for not posting actual code. It's proprietary, and also a
> rather huge package, and I'm not sure how to simplify it, as the problem
> only occurs with large documents.
> I'll try to answer any specific questions by looking into the original
> code.
>
> --
> Sebastian Leske
> System- und Anwendungsentwicklung
> Tel: 0211/92495-146
>
> IOn AG
> http://www.ion.ag/
> Vorstand: Rudolf Franke, Erik Rehrmann, Manfred Siller
> Aufsichtsratsvorsitzender: Reinhard Möntmann
> Sitz der Gesellschaft: Erkrath
> Amtsgericht Wuppertal: HRB 14181
> USt Id-Nr.: DE 121642062
>
>
> ---------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org

Re: What does popRTFContext() do?

Posted by Sebastian Leske <Le...@ion.ag>.

Hello,

On 17.06.2011 11:44, Jesper Steen Møller wrote:
> I think I can explain what it does, but I can't tell you if
> disabling it is a bad idea.
>
> RTFDTM means Result Tree Fragment Data Table Model. Xalan uses the
> RTFDTM tables to save object overhead on the generated result tree
> fragments, and thus, the NodeLists you keep a reference to will
> point to table entries which - by design - will get overwritten (as
> result tree fragments often  get serialized out right away). In
> effect, this a sort of manual memory management, especially useful
> for older JVM which had slower garbage collection algorithms.

thanks a lot for the explanation. I had guessed something like this - 
manual memory management with manual garbage collection :-).

> By leaving out the popRTFContext, you'll keep accumulating that
> state (in effect all the result tree fragment state). Your memory
> consumption should grow as well. The alternative would be to deep
> clone those NodeLists before you stash them away for rearranging.

Yes, that makes sense.

> Question is whether there's an upper limit or something on the
> RTFDTM. Anyone?

As far as I can tell from the internal structures I have seen, there 
does not seem to be any fixed limit (unless maybe you exhaust the range 
of the int IDs used internally, but that seems unlikely). So the only 
thing that can happen is an OutOfMemoryError?

That's something we could live with: The code works fine right now, it 
processes our documents with reasonable RAM usage. If this ever becomes 
a problem, we'll find out, then we can tackle it.

But maybe someone else can shed more light on this?

Greetings,

S.Leske

-- 
Sebastian Leske
System- und Anwendungsentwicklung
Tel: 0211/92495-146

IOn AG
http://www.ion.ag/
Vorstand: Rudolf Franke, Erik Rehrmann, Manfred Siller
Aufsichtsratsvorsitzender: Reinhard Möntmann
Sitz der Gesellschaft: Erkrath
Amtsgericht Wuppertal: HRB 14181
USt Id-Nr.: DE 121642062

---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org

Re: What does popRTFContext() do?

Posted by Jesper Steen Møller <je...@selskabet.org>.

Hi Sebastian

I think I can explain what it does, but I can't tell you if disabling it is a bad idea. 

RTFDTM means Result Tree Fragment Data Table Model.
Xalan uses the RTFDTM tables to save object overhead on the generated result tree fragments, and thus, the NodeLists you keep a reference to will point to table entries which - by design - will get overwritten (as result tree fragments often  get serialized out right away). In effect, this a sort of manual memory management, especially useful for older JVM which had slower garbage collection algorithms.

By leaving out the popRTFContext, you'll keep accumulating that state (in effect all the result tree fragment state). Your memory consumption should grow as well. The alternative would be to deep clone those NodeLists before you stash them away for rearranging.

Question is whether there's an upper limit or something on the RTFDTM. Anyone?

Hope this helps.

-Jesper


On 17/06/2011, at 10.23, Sebastian Leske wrote:

> Hi,
> 
> an internal project of ours uses Xalan to parse and convert XML files to reports in PDF format. We are having some trouble getting Xalan to do what we want, so I'd like to solicit advice.
> 
> First, some background (sorry for so much text, I tried to make it brief):
> 
> The reports we produce have some very peculiar rules for pagination (placing of page breaks): Sometimes certain pairs of pages need to face each other, so empty pages must be inserted according to special rules, and sometimes paragraphs must be reordered to achieve this.
> 
> Therefore, use custom Java code that "plugs into" Xalan (by using a special XSL file that calls into our classes). This code implements custom pagination rules.
> 
> Our problem when implementing this was that we found no direct support in XSL / Xalan to *reorder* parts of the document (so the order in the final PDF differs from the order in the XML input file). So we implemented a PageBreaker class (called by pagebreaker.xsl), which maintains an internal queue of document fragments. In practice the pagebreaker.xsl contains templates:
> 
> * start-part: Starts a new queue (called "part")
> * write-unit: puts an XML fragment (usually a paragraph) into the internal "queue"
> * finish-part: Writes out the accumulated queue, reordering as needed.
> 
> These templates can be used from an XSL file to produce reports with the special pagination.
> 
> Internally, this queuing of document fragments is implemented (in write-unit) by having setter methods in our code that accept an org.w3c.dom.NodeList. When this code is used, Xalan passes in a document fragment as a NodeList. The NodeList is queued internally, document processing continues without producing output, and then later when "finish-part" is called, the whole (reordered) queue is retrieved and returned to Xalan for rendering.
> 
> *Our problem:*
> 
> Sometimes, when we retrieve the NodeList instances (more precisely, the org.w3c.dom.Node instances inside), their contents have unexplicably changed. It's as if someone changed the queued instances behind our back between the time we put it in and the time we take it back out of the queue.
> 
> After some digging through Xalan I found out that the NodeList and Node instances returned by Xalan are just facades for Xalan's internal data structures. Apparently the actual document data is stored in internal lists inside Xalan, and the Node instances just point there.
> 
> We found out that uncommenting the line
> 
> xctxt.popRTFContext();
> 
> in org.apache.xalan.templates.ElemTemplate.execute(TransformerImpl)
> made our problems go away.
> 
> We believe that popRTFContext() does some internal clean-up which invalidates the data we queue.
> 
> My question:
> 
> Can anyone explain more about the internal workings of Xalan?
> Is our analysis correct?
> Are there any problems with disabling the above line?
> Are there any other pitfalls to be expected, or can we just continue working without popRTFContext()?
> 
> Greetings,
> 
> Sebastian Leske
> 
> P.S. Sorry for not posting actual code. It's proprietary, and also a rather huge package, and I'm not sure how to simplify it, as the problem only occurs with large documents.
> I'll try to answer any specific questions by looking into the original code.
> 
> -- 
> Sebastian Leske
> System- und Anwendungsentwicklung
> Tel: 0211/92495-146
> 
> IOn AG
> http://www.ion.ag/
> Vorstand: Rudolf Franke, Erik Rehrmann, Manfred Siller
> Aufsichtsratsvorsitzender: Reinhard Möntmann
> Sitz der Gesellschaft: Erkrath
> Amtsgericht Wuppertal: HRB 14181
> USt Id-Nr.: DE 121642062
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xalan-dev-help@xml.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org