You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Howard Stearns <st...@curl.com> on 2001/10/31 21:24:54 UTC

big files / No more DTM IDs / ArrayIndexOutOfBounds

I'm looking for

  reports of progress,
  suggested workarounds, or
  suggestions for getting help (e.g., contacting anyone with offers to
reproduce)

for the "No more DTM IDs" or "ArrayIndexOutOfBounds" problem.

-------------
I have a 1MB .xml file and stylesheet that turns it into 300+ .html
files with 15,000+ internal links.  There's an index that, for each
linked-to term, lists all the sections that contain the term.

Using Xalan-Java 2.2 D9, I got ArrayIndexOutOfBoundsErrors similar to
what has been reported in Bug 3438 and Bug 2983.  When I tried
Xalan-Java 2.2 D11, the error changed to DTMException: No more DTM IDs
are available (as has been reported for 2983).

I am a command-line xalan user (e.g., xalan.xslt.Process, not my own
Java code).  I cannot reproduce this with a smaller file -- deleting
chunks of the source document moves the error around, or gets rid of it
entirely.  I can't seem to find either source .xml that always triggers
the problem, nor a problem with the .xsl that triggers the problem with
smaller input.  Thus I'm convinced it's a size-problem/Xalan-bug.

If it will help someone solve the problem, I can try to supply a
reproducable .xml and .xsl fileset, but purging the .xml of proprietary
data is a bit of work, and it sounds like the problem already has two
tickets filed.  (Maybe Bug 3447 is related, too?)

[I have now exhausted my knowledge of both Xalan internals and Apache
procedures and infrastructure.]

One (red herring?) clue: processing the .xml with saxon does produce
complete results, but the fancy index is silently blank.  The index
works properly with xalan on a smaller data set.  In fact, I *think* I
can run every part of my .xml through and get the right results, just
not the whole thing at once.

Re: big files / No more DTM IDs / ArrayIndexOutOfBounds

Posted by Benjamin Franz <sn...@nihongo.org>.
On Wed, 31 Oct 2001, Howard Stearns wrote:

> I'm looking for
>
>   reports of progress,
>   suggested workarounds, or
>   suggestions for getting help (e.g., contacting anyone with offers to
> reproduce)
>
> for the "No more DTM IDs" or "ArrayIndexOutOfBounds" problem.

I just encountered this problem this morning myself. The issue seems to
stem from the low size (22 bits) set in org/apache/xml/dtm/DTMManager.java
for IDENT_DTM_NODE_BITS. The process of transforming large XML documents
simple overwhelms this - especially if the stylesheets make use of loops
and xml:variable/xml:param (we encountered it generating HTML with a few
hundred select options from a 1.5 Mb XML file). Our workaround (obviously
not a long term solution) was to factor the xml:variable expressions out
of the xsl:for-each expressions to lower the number of nodes that were
generated during the transform.

I have been looking at org/apache/xml/dtm/DTMManager.java to try and see
what immediate impact increasing IDENT_DTM_NODE_BITS to around 26 bits and
recompiling would have. This looks like something that should be
re-examined in terms of its 'hardwiredness'. I also wonder if 'int' is the
appropriate bit vector length to be used for node identifiers - I think
'long' may be a better long term fit since it appears that increasing the
number of nodes immediately begins noticably impacting the number of
documents permitted adversely.

If anyone else has suggestions, I'm interested in this as well.

-- 
Benjamin Franz

 "Code as if whoever maintains your code is a violent
  psychopath who knows where you live."
                    -- Nancy Lebovitz, the button lady