You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org> on 2009/10/14 04:57:31 UTC

[jira] Resolved: (XERCESJ-1398) Supplying document without content-type headers causes entire stream to be buffered in memory, even when using SAX API

     [ https://issues.apache.org/jira/browse/XERCESJ-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Glavassevich resolved XERCESJ-1398.
-------------------------------------------

    Resolution: Cannot Reproduce

Sorry, I cannot reproduce what you're seeing.

Xerces has no problem reading a 1.8 GB document:

java sax.Counter file:///D:/xmldocs/bigFile.xml
file:///D:/xmldocs/bigFile.xml: 94640 ms (100010001 elems, 0 attrs, 0 spaces, 877800000 chars)

and also has no issues with a 7.8 GB document:

java sax.Counter file:///D:/xmldocs/bigFile2.xml
file:///D:/xmldocs/bigFile2.xml: 1041968 ms (400020001 elems, 0 attrs, 0 spaces, 3955600000 chars)

This last one is far larger than a normal heap and I'm sure that other users have successfully read documents this big (e.g. an XML dump from Wikipedia).  RewindableInputStream stops buffering very early in the document.

I have a suspicion that the code that you were using and produced the patch from isn't the Apache codebase. "revision 101962" doesn't correspond to any version of XMLEntityManager in Apache SVN.  In fact the first SVN rev was 317483 and as of today is 822684.

> Supplying document without content-type headers causes entire stream to be buffered in memory, even when using SAX API
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1398
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1398
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: SAX
>    Affects Versions: 2.9.1
>         Environment: Debian Linux, Sun JDK 1.5.0_20
>            Reporter: Karl Wright
>
> If the parser needs to autodetect the encoding of the input stream, it wraps the input stream using the RewindableInputStream class within XMLEntityManager.  But this class buffers everything that is read from the stream, even after the autodetection is complete (and no possibility of rewind being used exists anymore).  It is therefore trivial to submit XML to xerces2-j which causes an "OutOfMemoryError" exception to be thrown, which could lead to a denial of service under appropriate conditions.
> The fix I created for this involved adding a method "stopBuffering()" to the RewindableInputStream class, which shuts off further buffering by that class.  I call this method when the encoding has been decided upon (i.e. right before createReader is called, everywhere).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org