You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2001/08/07 20:53:24 UTC
[Bug 3013] - Large File Parsing
PLEASE DO NOT REPLY TO THIS MESSAGE. TO FURTHER COMMENT
ON THE STATUS OF THIS BUG PLEASE FOLLOW THE LINK BELOW
AND USE THE ON-LINE APPLICATION. REPLYING TO THIS MESSAGE
DOES NOT UPDATE THE DATABASE, AND SO YOUR COMMENT WILL
BE LOST SOMEWHERE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3013
*** shadow/3013 Mon Aug 6 20:47:59 2001
--- shadow/3013.tmp.21412 Tue Aug 7 11:53:24 2001
***************
*** 26,28 ****
--- 26,49 ----
For values of -f such as 10,15,18 there is no problem. The binary can be made
using the file at http://monetdb.cwi.nl/xml/Assets/unix.c
+
+ ------- Additional Comments From jjc@hpl.hp.com 2001-08-07 11:53 -------
+ I reproduced this.
+
+ The problem is the input file is more than 2^31 bytes long.
+
+ The offset (XMLEntityReader.fCurrentOffset) hence wraps around to a negative
+ number.
+ Shortly after xerces falls over in
+ org.apache.xerces.utils.UTF8DataChunk.addSymbol
+
+ I don't know what should be done. I would guess this is a WONTFIX, but the
+ error messages could be improved. Difficult to choose best place to catch it
+ though; I would assume that a minor change in the file would cause the sympton
+ (i.e. the exact place things go wrong) to be very different.
+
+ The value of the argument offset to UTF8DataChunk.addSymbol when it crashes is
+ -2147483551, there have been numerous calls to addSymbol with very large values
+ of offset near Integer.MAX_VALUE.
+
+
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org