You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2016/09/21 07:15:13 UTC
[Bug 60160] New: ArrayIndexOutOfBoundsException coming when trying
to extract text from doc file.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160
Bug ID: 60160
Summary: ArrayIndexOutOfBoundsException coming when trying to
extract text from doc file.
Product: POI
Version: 3.15-dev
Hardware: PC
OS: Linux
Status: NEW
Severity: major
Priority: P2
Component: HWPF
Assignee: dev@poi.apache.org
Reporter: akki.1607@gmail.com
Code -
byte[] bytes = IOUtils.toByteArray(new FileInputStream(file));
HWPFDocument doc = new HWPFDocument(new ByteArrayInputStream(bytes));
// using XWPFWordExtractor Class
System.out.println(doc.getDocumentText());
Exception stack trace -
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:342)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:186)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
at com.test.DocExtractor.main(DocExtractor.java:12)
If we can some how ignore this exception we can get other parts of the
document.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to
extract text from doc file.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160
--- Comment #3 from Javen O'Neal <on...@apache.org> ---
Do you know what software was used to generate the original file?
Without a way to reproduce the problem, there's not much that we can do.
You could run the file through POIFSDump, BiffViewer or other developer tools
(Microsoft publishes some validators), but it is unlikely that a developer will
spend much effort with such limited information, nothing to test, for such a
minor problem. They're more likely to introduce bugs by making changes.
https://poi.apache.org/apidocs/org/apache/poi/poifs/dev/POIFSDump.html
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to
extract text from doc file.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160
Dominik Stadler <do...@gmx.at> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |WONTFIX
Status|NEEDINFO |RESOLVED
--- Comment #6 from Dominik Stadler <do...@gmx.at> ---
No more information received for a long time and probably a corrupt file
created with some other tool, therefore we do not plan to fix anything until we
receive more information and/or a sample document here.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to
extract text from doc file.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160
Javen O'Neal <on...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
--- Comment #1 from Javen O'Neal <on...@apache.org> ---
Could you include the file that caused this problem?
FYI, it is simpler to open the document via a POIFSFileSystem.
POIFSFileSystem fs = POIFSFileSystem.create(file);
HWPFDocument doc = new HWPFDocument(fs);
doc.getDocumentText();
...
doc.close();
fs.close();
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to
extract text from doc file.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160
--- Comment #2 from Akash Sudhakar <ak...@gmail.com> ---
File is classified file. So cannot share it.
If we save the file again as doc file, then issue is not coming.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60160] ArrayIndexOutOfBoundsException coming when trying to
extract text from doc file.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60160
--- Comment #4 from Javen O'Neal <on...@apache.org> ---
http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/model/SectionTable.java?view=markup#l80
fileOffset or sepxSize is likely -1.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org