You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Daniel Noll <da...@nuix.com> on 2009/04/07 03:39:55 UTC
StringIndexOutOfBoundsException getting text pieces (POI 3.1)
Hi all.
We're on POI 3.1, and are getting the following exception when getting
the text pieces for some Word documents:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -22528
> at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:881)
> at java.lang.StringBuffer.substring(StringBuffer.java:416)
> at org.apache.poi.hwpf.usermodel.Range.text(Range.java:270)
> at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:196)
This appears to be coming from here:
> public String text()
> {
> initText();
>
> StringBuffer sb = new StringBuffer();
>
> for (int x = _textStart; x < _textEnd; x++)
> {
> TextPiece piece = (TextPiece)_text.get(x);
> int start = _start > piece.getStart() ? _start - piece.getStart() : 0;
> int end = _end <= piece.getEnd() ? _end - piece.getStart() : piece.getEnd() - piece.getStart();
>
> if(piece.usesUnicode()) // convert the byte pointers to char pointers
> {
> start/=2;
> end/=2;
> }
> sb.append(piece.getStringBuffer().substring(start, end));
> }
> return sb.toString();
> }
A breakpoint at the call to substring shows the following values:
_start = 7468
_end = 7680
piece.getStart() = 52736
_end - piece.getStart() = -45056
I notice that trunk is significantly different in this area... so I'm
wondering how safe it would be to pull just the changes in this area
into 3.1, or whether it is a practical impossibility due to being part
of a much larger changeset.
We can't upgrade to 3.2 as updating caused about half a dozen of our own
unit tests to fail.
Daniel
--
Daniel Noll Forensic and eDiscovery Software
Senior Developer The world's most advanced
Nuix email data analysis
http://nuix.com/ and eDiscovery software
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: StringIndexOutOfBoundsException getting text pieces (POI 3.1)
Posted by David Fisher <df...@jmlafferty.com>.
Hi Daniel,
>> You could try just using hwpf from trunk, and keeping your hssf
>> stuff on 3.1? That ought to work without too much faffing
>
> That might be worth a shot.
>
> At the very least, I can confirm that HWPF trunk works for all the
> files I have which fail on 3.1.
That is good news.
> I haven't checked tests yet, or gone through any kind of rigorous
> testing of any other sort though.
>
>> Meaningful unit tests that highlight problems are almost as welcome
>> as patches that fix those problems! :)
>
> Given that this issue in itself looks like a tricky one to backport,
> it is probably a better investment of my time to sanity check trunk
> again to find out what is broken over there if anything, and get
> those things fixed. (Otherwise 3.3 might come out and we wouldn't
> be able to use it, which would put us behind.)
3.5-FINAL will be very soon. A lot of changes in HSSF - OOXML
support ...
> If I do find any more failures where the test data can actually be
> redistributed (from time to time things get into our data set which
> can't, unfortunately) then I will report new bugs.
Understood, we appreciate all contributions :-)
Regards,
Dave
>
>
> Daniel
>
> --
> Daniel Noll Forensic and eDiscovery
> Software
> Senior Developer The world's most
> advanced
> Nuix email data
> analysis
> http://nuix.com/ and eDiscovery
> software
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: StringIndexOutOfBoundsException getting text pieces (POI 3.1)
Posted by Daniel Noll <da...@nuix.com>.
Nick Burch wrote:
> You could try just using hwpf from trunk, and keeping your hssf stuff on
> 3.1? That ought to work without too much faffing
That might be worth a shot.
At the very least, I can confirm that HWPF trunk works for all the files
I have which fail on 3.1. I haven't checked tests yet, or gone through
any kind of rigorous testing of any other sort though.
> Meaningful unit tests that highlight problems are almost as welcome as
> patches that fix those problems! :)
Given that this issue in itself looks like a tricky one to backport, it
is probably a better investment of my time to sanity check trunk again
to find out what is broken over there if anything, and get those things
fixed. (Otherwise 3.3 might come out and we wouldn't be able to use it,
which would put us behind.)
If I do find any more failures where the test data can actually be
redistributed (from time to time things get into our data set which
can't, unfortunately) then I will report new bugs.
Daniel
--
Daniel Noll Forensic and eDiscovery Software
Senior Developer The world's most advanced
Nuix email data analysis
http://nuix.com/ and eDiscovery software
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: StringIndexOutOfBoundsException getting text pieces (POI 3.1)
Posted by Nick Burch <ni...@torchbox.com>.
On Tue, 7 Apr 2009, Daniel Noll wrote:
> This appears to be coming from here:
>
>> public String text()
>> {
I think there might be another patch for this area waiting in bugzilla for
someone to have time to review it properly. We have been doing some bug
fixes in this area, but it's hard as the file format uses byte offsets in
half the places, and character offsets in others, and we're slowly trying
to ensure we use the right offset in the right place
You might want to grab the patch, apply it to trunk and see if it improves
things for you, but hopefully someone'll get a chance to review and commit
it in the next few weeks.
> I notice that trunk is significantly different in this area... so I'm
> wondering how safe it would be to pull just the changes in this area
> into 3.1, or whether it is a practical impossibility due to being part
> of a much larger changeset.
You could try just using hwpf from trunk, and keeping your hssf stuff on
3.1? That ought to work without too much faffing
> We can't upgrade to 3.2 as updating caused about half a dozen of our own
> unit tests to fail.
Meaningful unit tests that highlight problems are almost as welcome as
patches that fix those problems! :)
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org