You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2001/08/07 20:53:24 UTC

[Bug 3013] - Large File Parsing

PLEASE DO NOT REPLY TO THIS MESSAGE. TO FURTHER COMMENT
ON THE STATUS OF THIS BUG PLEASE FOLLOW THE LINK BELOW
AND USE THE ON-LINE APPLICATION. REPLYING TO THIS MESSAGE
DOES NOT UPDATE THE DATABASE, AND SO YOUR COMMENT WILL
BE LOST SOMEWHERE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3013

*** shadow/3013	Mon Aug  6 20:47:59 2001
--- shadow/3013.tmp.21412	Tue Aug  7 11:53:24 2001
***************
*** 26,28 ****
--- 26,49 ----
  
  For values of -f such as 10,15,18  there is no problem. The binary can be made
  using the file at http://monetdb.cwi.nl/xml/Assets/unix.c
+ 
+ ------- Additional Comments From jjc@hpl.hp.com  2001-08-07 11:53 -------
+ I reproduced this.
+ 
+ The problem is the input file is more than 2^31 bytes long.
+ 
+ The offset (XMLEntityReader.fCurrentOffset) hence wraps around to a negative 
+ number.
+ Shortly after xerces falls over in
+ org.apache.xerces.utils.UTF8DataChunk.addSymbol
+ 
+ I don't know what should be done. I would guess this is a WONTFIX, but the 
+ error messages could be improved. Difficult to choose best place to catch it 
+ though; I would assume that a minor change in the file would cause the sympton 
+ (i.e. the exact place things go wrong) to be very different.
+ 
+ The value of the argument offset to UTF8DataChunk.addSymbol when it crashes is
+ -2147483551, there have been numerous calls to addSymbol with very large values 
+ of offset near Integer.MAX_VALUE.
+ 
+ 

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org