You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2009/08/12 12:14:34 UTC

DO NOT REPLY [Bug 47685] New: extracting text from xls files fails

https://issues.apache.org/bugzilla/show_bug.cgi?id=47685

           Summary: extracting text from xls files fails
           Product: POI
           Version: 3.2-FINAL
          Platform: PC
        OS/Version: Windows Vista
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HSSF
        AssignedTo: dev@poi.apache.org
        ReportedBy: christiaan.fluit@aduna-software.com


--- Comment #0 from Christiaan Fluit <ch...@aduna-software.com> 2009-08-12 03:14:31 PDT ---
I have a couple of xls files that result in exceptions when I try to extract
their text. POI 3.2-FINAL gives the following stacktrace:

org.apache.poi.hssf.record.RecordFormatException: Unable to construct record
instance
    at
org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:186)
    at
org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:328)
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:271)
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:196)
    at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:178)
    at [proprietary code trace]
Caused by: java.lang.ArrayIndexOutOfBoundsException
    at
org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:142)
    at
org.apache.poi.hssf.record.RecordInputStream.readByte(RecordInputStream.java:151)
    at org.apache.poi.hssf.record.MMSRecord.<init>(MMSRecord.java:46)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at
org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:184)
    ... 25 common frames omitted

POI 3.5-beta5 gives this stacktrace:

org.apache.poi.hssf.record.RecordFormatException: Unable to construct record
instance
    at
org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:71)
    at
org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:269)
    at
org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:248)
    at
org.apache.poi.hssf.eventusermodel.HSSFRecordStream.getNextRecord(HSSFRecordStream.java:162)
    at
org.apache.poi.hssf.eventusermodel.HSSFRecordStream.nextRecord(HSSFRecordStream.java:93)
    at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:141)
    at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:98)
    at [proprietary code trace]
Caused by: org.apache.poi.hssf.record.RecordFormatException: Not enough data
(0) to read requested (1) bytes
    at
org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:185)
    at
org.apache.poi.hssf.record.RecordInputStream.readByte(RecordInputStream.java:193)
    at org.apache.poi.hssf.record.MMSRecord.<init>(MMSRecord.java:46)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at
org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:63)
    ... 12 more

Due to the nature of these files, I cannot post them here, but I am willing to
share them with developers looking into this bug.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47685] extracting text from xls files fails

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47685

--- Comment #2 from Andreas <an...@gmx.de> 2009-10-23 02:04:22 UTC ---
I had the same problem with a file created in MS Excel. I could solve the
problem by removing an image that was embedded over two cells.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47685] extracting text from xls files fails

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47685

Maxim Valyanskiy <ma...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #3 from Maxim Valyanskiy <ma...@gmail.com> 2010-04-27 05:30:47 EDT ---
Fixed in r938372

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47685] extracting text from xls files fails

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47685


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO


--- Comment #1 from Nick Burch <ni...@torchbox.com> 2009-08-12 06:50:08 PDT ---
Without the file I can only suggest you dig into the problematic record code
(MMSRecord), compare that to the published microsoft docs and see if you can
spot the issue

Also, it's worth opening the file in a new copy of office, and doing a "save
as". If that file opens without issue, then a workaround is probably needed for
whatever software wrote your file not quite according to the spec. If that
doesn't help, then that looks more like a record bug in poi.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org