You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2004/03/19 23:10:05 UTC

DO NOT REPLY [Bug 27807] New: - SAXParser beheading some strings

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=27807>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27807

SAXParser beheading some strings

           Summary: SAXParser beheading some strings
           Product: Xerces2-J
           Version: 2.6.0
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: Major
          Priority: Other
         Component: SAX
        AssignedTo: xerces-j-dev@xml.apache.org
        ReportedBy: arenson@spatzel.net


Platforms tested are AIX and Gentoo Linux. 
 
I have a java parser that implements ContentHandler and uses SAXParser 
to create a tab-delimited file of a subset of information in an XML file. 
 
My problem is that small percentages of the results from this code are being 
beheaded, by which I mean the string that's being returned is a subset of what's 
actually in the XML, with characters missing from the front of the string. 
 
My original XML file is 566+ MBs. I have managed to pare this down to about 
a 4 MB file, but haven't yet found a way to reproduce the problem on a smaller 
file. 
 
The following urls link to the xml file and the two java files used to parse the 
xml into the tab-delimited output: 
 
 
227.xml 
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/7404E5qOli 
 
BindParserInter.java 
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/26035zBIzer 
 
BindHandlerInter.java 
https://www.slashtmp.iu.edu/public/download.php?FILE=aarenson/897296MWT3R 
 
The following should compile the code and parse the xml: 
> javac BindParserInter.java 
> javac BindHandlerInter.java 
> java BindParserInter 227.xml > 227.txt 
 
The 227.xml file has 227 BIND-Interaction elements. The last one has the 
following subelement: 
 
                          <Org-ref_taxname>Mus musculus</Org-ref_taxname> 
 
After producing the tab-delimited file, the error I'm seeing is that the last 
line in the tab-delimited file contains only 'ulus' in the 7th field.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org