You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2016/09/21 07:15:13 UTC

[Bug 60160] New: ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60160

            Bug ID: 60160
           Summary: ArrayIndexOutOfBoundsException coming when trying to
                    extract text from doc file.
           Product: POI
           Version: 3.15-dev
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: akki.1607@gmail.com

Code -     
    byte[] bytes = IOUtils.toByteArray(new FileInputStream(file));
    HWPFDocument doc = new HWPFDocument(new ByteArrayInputStream(bytes));
    // using XWPFWordExtractor Class
    System.out.println(doc.getDocumentText());



Exception stack trace - 

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
    at java.lang.System.arraycopy(Native Method)
    at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84)
    at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:342)
    at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:186)
    at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
    at com.test.DocExtractor.main(DocExtractor.java:12)

If we can some how ignore this exception we can get other parts of the
document.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160

--- Comment #3 from Javen O'Neal <on...@apache.org> ---
Do you know what software was used to generate the original file?

Without a way to reproduce the problem, there's not much that we can do.
You could run the file through POIFSDump, BiffViewer or other developer tools
(Microsoft publishes some validators), but it is unlikely that a developer will
spend much effort with such limited information, nothing to test, for such a
minor problem. They're more likely to introduce bugs by making changes.

https://poi.apache.org/apidocs/org/apache/poi/poifs/dev/POIFSDump.html

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WONTFIX
             Status|NEEDINFO                    |RESOLVED

--- Comment #6 from Dominik Stadler <do...@gmx.at> ---
No more information received for a long time and probably a corrupt file
created with some other tool, therefore we do not plan to fix anything until we
receive more information and/or a sample document here.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160

Javen O'Neal <on...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #1 from Javen O'Neal <on...@apache.org> ---
Could you include the file that caused this problem?

FYI, it is simpler to open the document via a POIFSFileSystem.
POIFSFileSystem fs = POIFSFileSystem.create(file);
HWPFDocument doc = new HWPFDocument(fs);
doc.getDocumentText();
...
doc.close();
fs.close();

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160

--- Comment #2 from Akash Sudhakar <ak...@gmail.com> ---
File is classified file. So cannot share it.
If we save the file again as doc file, then issue is not coming.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to extract text from doc file.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160

--- Comment #4 from Javen O'Neal <on...@apache.org> ---
http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/model/SectionTable.java?view=markup#l80

fileOffset or sepxSize is likely -1.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org