You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2010/09/15 14:33:47 UTC

DO NOT REPLY [Bug 49933] New: Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

           Summary: Word 6/95 documents with sections cause
                    ArrayIndexOutOfBoundsException
           Product: POI
           Version: 3.7-dev
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HWPF
        AssignedTo: dev@poi.apache.org
        ReportedBy: adamwilmer@yahoo.co.uk


Created an attachment (id=26027)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26027)
Word 95 document with a section

Processing a word 6/word 95 document with sections causes
ArrayIndexOutOfBoundsException. Tika (Revision: 997224, 2010-09-14) with
3.7-beta2 POI dependency on the attached document gives rise to:

Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1e7c5cb
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:165)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:146)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:197)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:71)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 22
        at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:46)
        at org.apache.poi.hwpf.sprm.SprmOperation.<init>(SprmOperation.java:54)
        at org.apache.poi.hwpf.sprm.SprmIterator.next(SprmIterator.java:45)
        at
org.apache.poi.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(SectionSprmUncompressor.java:36)
        at org.apache.poi.hwpf.model.SEPX.<init>(SEPX.java:33)
        at
org.apache.poi.hwpf.model.OldSectionTable.<init>(OldSectionTable.java:61)
        at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:103)
        at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:42)
        at
org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:150)
        at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:51)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:187)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:163)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49933] Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

Nick Burch <ni...@alfresco.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #4 from Nick Burch <ni...@alfresco.com> 2010-09-19 06:00:25 EDT ---
I've added a slightly icky fix of adding a couple of spare 0 bytes on the end
of the array, so that we should always be able to decode the SEPX without
error, even if not always making sense of the contents fully...

I can now process all 6 of your files without error

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49933] Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

ssmeets@ravn.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |

--- Comment #2 from ssmeets@ravn.co.uk 2010-09-17 15:48:11 EDT ---
Hi Nick,

Thanks for your fix. This fixes several documents, but there are still some
documents that produce ArrayIndexOutOfBoundsExceptions. Attached the files that
cause the execpetions being thrown. Unfortunately my knowledge of old Word docs
is limited, otherwise I could have helped.

Stacktraces:
Processing: Case1.doc
java.lang.ArrayIndexOutOfBoundsException: 240
    at org.apache.poi.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:94)
    at
org.apache.poi.hwpf.sprm.SectionSprmUncompressor.unCompressSEPOperation(SectionSprmUncompressor.java:57)
    at
org.apache.poi.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(SectionSprmUncompressor.java:37)
    at org.apache.poi.hwpf.model.SEPX.<init>(SEPX.java:33)
    at
org.apache.poi.hwpf.model.OldSectionTable.<init>(OldSectionTable.java:61)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:103)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:42)
    at com.ravn.test.poi.OldMSDocTester.parse(OldMSDocTester.java:27)
    at com.ravn.test.Tester.main(Tester.java:29)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
Processing: Case2.doc
java.lang.ArrayIndexOutOfBoundsException: 244
Processing: Case3.doc
java.lang.ArrayIndexOutOfBoundsException: 32
Processing: Case4.doc
java.lang.ArrayIndexOutOfBoundsException: 26
Processing: Case5.doc
java.lang.ArrayIndexOutOfBoundsException: 238
Processing: Case6.doc
java.lang.ArrayIndexOutOfBoundsException: 247

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49933] Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

Nick Burch <ni...@alfresco.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #1 from Nick Burch <ni...@alfresco.com> 2010-09-17 09:47:23 EDT ---
That turned out to be slightly trickier than expected, as there were issues
with both the Sprm decoding and the byte/character translation on the old
section table

Fixed in r998131. The fix also seems to have improved some problem word97 files
too, so it's not all bad!

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49933] Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

Adam <ad...@yahoo.co.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adamwilmer@yahoo.co.uk

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49933] Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

--- Comment #3 from ssmeets@ravn.co.uk 2010-09-17 15:49:48 EDT ---
Created an attachment (id=26046)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26046)
Documents that throw an ArrayIndexOutOfBoundsException

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49933] Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

Maxim Valyanskiy <ma...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |

--- Comment #5 from Maxim Valyanskiy <ma...@gmail.com> 2010-09-27 09:12:25 EDT ---
Last fix did broke another Word95 file:

java.lang.ArrayIndexOutOfBoundsException: 34
    at org.apache.poi.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:94)
    at
org.apache.poi.hwpf.sprm.SectionSprmUncompressor.unCompressSEPOperation(SectionSprmUncompressor.java:57)
    at
org.apache.poi.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(SectionSprmUncompressor.java:37)
    at org.apache.poi.hwpf.model.SEPX.<init>(SEPX.java:33)
    at
org.apache.poi.hwpf.model.OldSectionTable.<init>(OldSectionTable.java:66)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:103)
    at
org.apache.poi.hwpf.extractor.Word6Extractor.<init>(Word6Extractor.java:58)
    at
org.apache.poi.hwpf.extractor.Word6Extractor.<init>(Word6Extractor.java:55)
    at
org.apache.poi.hwpf.extractor.Word6Extractor.<init>(Word6Extractor.java:47)
    at
org.apache.poi.hwpf.extractor.TestWordExtractor.testWord95err(TestWordExtractor.java:279)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at
com.intellij.junit3.JUnit3IdeaTestRunner.doRun(JUnit3IdeaTestRunner.java:108)
    at
com.intellij.junit3.JUnit3IdeaTestRunner.startRunnerWithArgs(JUnit3IdeaTestRunner.java:42)
    at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49933] Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #7 from Sergey Vladimirov <vl...@gmail.com> 2011-07-09 15:38:33 UTC ---
Workaround for this bug implemented in trunk. Now section properties won't be
parsed immediatly on loading. Text is extracted (but encoding is not, sorry).

"Real" fix shall include new Word95 SPRM parser (which is different from
Word97-or-later SPRM parsed).

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49933] Word 6/95 documents with sections cause ArrayIndexOutOfBoundsException

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49933

--- Comment #6 from Maxim Valyanskiy <ma...@gmail.com> 2010-09-27 09:13:27 EDT ---
Created an attachment (id=26083)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26083)
word95 doc

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org