You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2004/03/19 23:10:05 UTC
DO NOT REPLY [Bug 27807] New: -
SAXParser beheading some strings
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=27807>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=27807
SAXParser beheading some strings
Summary: SAXParser beheading some strings
Product: Xerces2-J
Version: 2.6.0
Platform: PC
OS/Version: Linux
Status: NEW
Severity: Major
Priority: Other
Component: SAX
AssignedTo: xerces-j-dev@xml.apache.org
ReportedBy: arenson@spatzel.net
Platforms tested are AIX and Gentoo Linux.
I have a java parser that implements ContentHandler and uses SAXParser
to create a tab-delimited file of a subset of information in an XML file.
My problem is that small percentages of the results from this code are being
beheaded, by which I mean the string that's being returned is a subset of what's
actually in the XML, with characters missing from the front of the string.
My original XML file is 566+ MBs. I have managed to pare this down to about
a 4 MB file, but haven't yet found a way to reproduce the problem on a smaller
file.
The following urls link to the xml file and the two java files used to parse the
xml into the tab-delimited output:
227.xml
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/7404E5qOli
BindParserInter.java
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/26035zBIzer
BindHandlerInter.java
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/897296MWT3R
The following should compile the code and parse the xml:
> javac BindParserInter.java
> javac BindHandlerInter.java
> java BindParserInter 227.xml > 227.txt
The 227.xml file has 227 BIND-Interaction elements. The last one has the
following subelement:
<Org-ref_taxname>Mus musculus</Org-ref_taxname>
After producing the tab-delimited file, the error I'm seeing is that the last
line in the tab-delimited file contains only 'ulus' in the 7th field.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org