You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Daniel Noll <da...@nuix.com> on 2009/04/07 03:39:55 UTC

StringIndexOutOfBoundsException getting text pieces (POI 3.1)

Hi all.

We're on POI 3.1, and are getting the following exception when getting 
the text pieces for some Word documents:

> java.lang.StringIndexOutOfBoundsException: String index out of range: -22528
> 	at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:881)
> 	at java.lang.StringBuffer.substring(StringBuffer.java:416)
> 	at org.apache.poi.hwpf.usermodel.Range.text(Range.java:270)
> 	at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:196)

This appears to be coming from here:

>   public String text()
>   {
>     initText();
> 
>     StringBuffer sb = new StringBuffer();
> 
>     for (int x = _textStart; x < _textEnd; x++)
>     {
>       TextPiece piece = (TextPiece)_text.get(x);
>       int start = _start > piece.getStart() ? _start - piece.getStart() : 0;
>       int end = _end <= piece.getEnd() ? _end - piece.getStart() : piece.getEnd() - piece.getStart();
> 
>       if(piece.usesUnicode()) // convert the byte pointers to char pointers
>       {
>         start/=2;
>         end/=2;
>       }
>       sb.append(piece.getStringBuffer().substring(start, end));
>     }
>     return sb.toString();
>   }

A breakpoint at the call to substring shows the following values:

   _start = 7468
   _end = 7680
   piece.getStart() = 52736
   _end - piece.getStart() = -45056

I notice that trunk is significantly different in this area... so I'm 
wondering how safe it would be to pull just the changes in this area 
into 3.1, or whether it is a practical impossibility due to being part 
of a much larger changeset.

We can't upgrade to 3.2 as updating caused about half a dozen of our own 
unit tests to fail.

Daniel



-- 
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: StringIndexOutOfBoundsException getting text pieces (POI 3.1)

Posted by David Fisher <df...@jmlafferty.com>.
Hi Daniel,

>> You could try just using hwpf from trunk, and keeping your hssf  
>> stuff on 3.1? That ought to work without too much faffing
>
> That might be worth a shot.
>
> At the very least, I can confirm that HWPF trunk works for all the  
> files I have which fail on 3.1.

That is good news.

>  I haven't checked tests yet, or gone through any kind of rigorous  
> testing of any other sort though.
>
>> Meaningful unit tests that highlight problems are almost as welcome  
>> as patches that fix those problems! :)
>
> Given that this issue in itself looks like a tricky one to backport,  
> it is probably a better investment of my time to sanity check trunk  
> again to find out what is broken over there if anything, and get  
> those things fixed.  (Otherwise 3.3 might come out and we wouldn't  
> be able to use it, which would put us behind.)

3.5-FINAL will be very soon. A lot of changes in HSSF - OOXML  
support ...

> If I do find any more failures where the test data can actually be  
> redistributed (from time to time things get into our data set which  
> can't, unfortunately) then I will report new bugs.

Understood, we appreciate all contributions :-)

Regards,
Dave

>
>
> Daniel
>
> -- 
> Daniel Noll                            Forensic and eDiscovery  
> Software
> Senior Developer                              The world's most  
> advanced
> Nuix                                                email data  
> analysis
> http://nuix.com/                                and eDiscovery  
> software
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: StringIndexOutOfBoundsException getting text pieces (POI 3.1)

Posted by Daniel Noll <da...@nuix.com>.
Nick Burch wrote:
> You could try just using hwpf from trunk, and keeping your hssf stuff on 
> 3.1? That ought to work without too much faffing

That might be worth a shot.

At the very least, I can confirm that HWPF trunk works for all the files 
I have which fail on 3.1.  I haven't checked tests yet, or gone through 
any kind of rigorous testing of any other sort though.

> Meaningful unit tests that highlight problems are almost as welcome as 
> patches that fix those problems! :)

Given that this issue in itself looks like a tricky one to backport, it 
is probably a better investment of my time to sanity check trunk again 
to find out what is broken over there if anything, and get those things 
fixed.  (Otherwise 3.3 might come out and we wouldn't be able to use it, 
which would put us behind.)

If I do find any more failures where the test data can actually be 
redistributed (from time to time things get into our data set which 
can't, unfortunately) then I will report new bugs.

Daniel

-- 
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: StringIndexOutOfBoundsException getting text pieces (POI 3.1)

Posted by Nick Burch <ni...@torchbox.com>.
On Tue, 7 Apr 2009, Daniel Noll wrote:
> This appears to be coming from here:
>
>>   public String text()
>>   {

I think there might be another patch for this area waiting in bugzilla for 
someone to have time to review it properly. We have been doing some bug 
fixes in this area, but it's hard as the file format uses byte offsets in 
half the places, and character offsets in others, and we're slowly trying 
to ensure we use the right offset in the right place

You might want to grab the patch, apply it to trunk and see if it improves 
things for you, but hopefully someone'll get a chance to review and commit 
it in the next few weeks.

> I notice that trunk is significantly different in this area... so I'm 
> wondering how safe it would be to pull just the changes in this area 
> into 3.1, or whether it is a practical impossibility due to being part 
> of a much larger changeset.

You could try just using hwpf from trunk, and keeping your hssf stuff on 
3.1? That ought to work without too much faffing

> We can't upgrade to 3.2 as updating caused about half a dozen of our own 
> unit tests to fail.

Meaningful unit tests that highlight problems are almost as welcome as 
patches that fix those problems! :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org