You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@poi.apache.org by Taro App <ap...@gmail.com> on 2011/02/09 04:35:57 UTC

RecordFormatException when reading a Excel file

Hi, I get the following error when reading a Exel file with POI 3.7.

Exception in thread "main"
org.apache.poi.hssf.record.RecordFormatException: Unable to construct
record instance
	at org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:65)
	at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:300)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:270)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:236)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:442)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:263)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:188)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:305)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:286)
	at aflat4.apps.adr.POITest.main(POITest.java:18)
Caused by: org.apache.poi.hssf.record.RecordFormatException: Not
enough data (1) to read requested (2) bytes
	at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:216)
	at org.apache.poi.hssf.record.RecordInputStream.readUShort(RecordInputStream.java:267)
	at org.apache.poi.util.StringUtil.readUnicodeLE(StringUtil.java:277)
	at org.apache.poi.hssf.record.common.UnicodeString$ExtRst.<init>(UnicodeString.java:172)
	at org.apache.poi.hssf.record.common.UnicodeString.<init>(UnicodeString.java:438)
	at org.apache.poi.hssf.record.SSTDeserializer.manufactureStrings(SSTDeserializer.java:55)
	at org.apache.poi.hssf.record.SSTRecord.<init>(SSTRecord.java:250)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
	at org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:57)
	... 9 more

The file is saved with Excel 2003. It's a confidential file, so I
can't provide the file here. BiffViewer reveals a weired UNKNOWNRECORD
right before the error occurs. Below is the last part of BiffViewer
output.

Offset=0x00002661(9825) recno=402 sid=0x0018 size=0x001B(27)
[NAME]
    .option flags           = 0x0020
    .keyboard shortcut      = 0x00
    .length of the name     = 1
    .extSheetIx(1-based, 0=Global)= 0
    .sheetTabIx             = 31
    .Menu text length       = 0
    .Description text length= 0
    .Help topic text length = 0
    .Status bar text length = 0
    .NameIsMultibyte        = false
    .Name (Unicode text)    = Print_Area
    .Formula (nTokens=1):
       org.apache.poi.hssf.record.formula.Area3DPtg [sheetIx=12 ! $A$1:$E$32]R
    .Menu text       =
    .Description text=
    .Help topic text =
    .Status bar text =
[/NAME]

Offset=0x00002680(9856) recno=403 sid=0x01C1 size=0x0008(8)
[RECALCID]
    .reserved = 0x0000
    .engineId = 0x0001BE22
[/RECALCID]

Offset=0x0000268C(9868) recno=404 sid=0x0866 size=0x0F5E(3934)
[UNKNOWNRECORD] (0x866)
  rawData=[66, 08, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 02, 00, 0F,
00, 00, F0, 48, 0F, 00, 00, 00, 00, 06, F0, 28, 0F, 00, 00, 01, 8C,
07, 00, E4, 01, 00, 00, E3, 01, 00, 00, 00, 00, 00, 00, 01, 00, 00,
00, 01, 00, 00, 00, 02, 00, 00, 00, 01, 00, 00, 00, 01, 00, 00, 00,
01, 00, 00, 00, 02, 00, 00, 00, 01, 00, 00, 00, 03, 00, 00, 00, 01,
00, 00, 00, 04, 00, 00, 00, 01, 00, 00, 00, 05, 00, 00, 00, 01, 00,
00, 00, 06, 00, 00, 00, 01, 00, 00, 00, 07, 00, 00, 00, 01, 00, 00,
00, 08, 00, 00, 00, 01, 00, 00, 00, 09, 00, 00, 00, 01, 00,
...... ****LONG DATA HERE**** .....
]
[/UNKNOWNRECORD]

Offset=0x000035EE(13806) recno=405 sid=0x00EB size=0x0242(578)
[MSODRAWINGGROUP]
No Escher Records Decoded
[/MSODRAWINGGROUP]

*****ERROR OCCURS HERE*****

Please help.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: RecordFormatException when reading a Excel file

Posted by Taro App <ap...@gmail.com>.

I have created a ticket :
https://issues.apache.org/bugzilla/show_bug.cgi?id=50779
I'll update more there.


On Tue, Feb 15, 2011 at 3:39 PM, Taro App <ap...@gmail.com> wrote:
> Okay, I understand what was going on now. Consulting to
> OpenOffice.org's documentation of Excel file format
> (http://sc.openoffice.org/excelfileformat.pdf) chapter 2.5.3 Unicode
> Strings (BIFF8), I checked the data carefully with a binary editor. I
> confirmed that CONTINUE(003Ch) records can appear right in between two
> bytes that construct a single unicode character. Here is an example:
>
> ----------------------------------------------------------------------
> Binary Data:
> 0B 00 05 54 00 00 00 AC 4E FD 90 02 5E F3 53 AC
> 4E 3A 53 AC 4E 17 53 64 8D F3 77 3A 75 01 00 50
> 00 09 00 35 00 05 00 14 00 14 00 AD 30 E7 30 A6
> 30 C8 30 B7 30 A6 30 AD 30 E7 30 A6 3C 00 1E 20
> 30 AF 30 B1 30 A4 30 DB 30 AF 30 A2 30 AB 30 A4
> 30 B7 30 DE 30 C1 30 00 00 00 00 03 00 05 00 03
> 00 03 00 0A 00 06 00 02 00 0E 00 08 00 02 00 12
> 00 0A 00 01 00
>
> UnicodeString:
> 0B 00 -> 11 characters
> 05 -> not compressed, extended
> 54 00 00 00 -> length of ExtRst is 0x54 bytes
> AC 4E FD 90 02 5E F3 53 AC 4E 3A 53 AC 4E 17 53
> 64 8D F3 77 3A 75 -> string data (11 characters)
>
> ExtRst:
> 01 00 -> reserved = 1
> 50 00 -> length of phonetic string is 0x50 bytes
> 09 00 35 00 -> formatting infomation
> 05 00 -> number of phonetic text runs is 5
> 14 00 -> number of characters is 20
> 14 00 -> number of characters is 20
> AD 30 E7 30 A6 30 C8 30 B7 30 A6 30 AD 30 E7 30
> A6 [3C 00 1E 20] 30 AF 30 B1 30 A4 30 DB 30 AF 30
> A2 30 AB 30 A4 30 B7 30 DE 30 C1 30
> -> phonetic data (20 characters)
> * 0x30A6 is a unicode character, but a CONTINUE record is in the middle
> 00 00 00 00 03 00 -> portion 1
> 05 00 03 00 03 00 -> portion 2
> 0A 00 06 00 02 00 -> portion 3
> 0E 00 08 00 02 00 -> portion 4
> 12 00 0A 00 01 00 -> portion 5
> ----------------------------------------------------------------------
>
> I also found that CONTINUE records can appear in the middle of portion
> data. Here is an example:
>
> ----------------------------------------------------------------------
> Binary Data:
> 09 00 05 36 00 00 00 F3 77 DD 5D 0C 77 7D 76 71
> 5C 02 5E 0A 4E CE 91 3A 75 01 00 32 00 09 00 35
> 00 03 00 0B 00 0B 00 A4 30 B7 30 AB 30 EF 30 B1
> 30 F3 30 CF 30 AF 30 B5 30 F3 30 B7 30 00 00 00
> 00 03 00 06 00 03 00 02 3C 00 20 20 00 0A 00 05
> 00 01 00
>
> UnicodeString:
> 09 00 -> 9 characters
> 05 -> not compressed, extended
> 36 00 00 00 -> length of ExtRst is 0x36 bytes
> F3 77 DD 5D 0C 77 7D 76 71 5C 02 5E 0A 4E CE 91
> 3A 75 -> string data (9 characters)
>
> ExtRst:
> 01 00 -> reserved = 1
> 32 00 -> length of phonetic string is 0x32 bytes
> 09 00 35 00 -> formatting infomation
> 03 00 -> number of phonetic text runs is 3
> 0B 00 -> number of characters is 11
> 0B 00 -> number of characters is 11
> A4 30 B7 30 AB 30 EF 30 B1 30 F3 30 CF 30 AF 30
> B5 30 F3 30 B7 30 -> phonetic data (11 characters)
> 00 00 00 00 03 00 -> portion 1
> 06 00 03 00 02 {3C 00 20 20} 00 -> portion 2
> * a CONTINUE record is in the middle
> 0A 00 05 00 01 00 -> portion 3
> ----------------------------------------------------------------------
>
> Excel files in the above patterns are not readable by POI 3.7.
> Now I know the cause of the issue, so maybe I can create a Excel file
> to duplicate the POI error.
> But I don't think I can fix POI myself. I hope someone can help with that. Nick?
>
> apptaro
>
>
>
> On Tue, Feb 15, 2011 at 12:21 PM, Taro App <ap...@gmail.com> wrote:
>> Hi Nick,
>>
>> I tested if "Save As" fixes the issue, and found out very interesting results.
>> For example, there is an Excel file A, which POI can't read at a string ABC.
>> Open the file A in Excel and save as a file B. Then, POI error occurs
>> at a different string DEF when reading the file B.
>> Open the file B in Excel and save as a file C. Then, POI error occurs
>> at a different string GHI when reading the file C.
>> Open the file C in Excel and save as a file D. Then, POI error occurs
>> at a different string JKL when reading the file D.
>> Open the file A again in Excel and save as a file B2. Then, POI error
>> occurs at the string DEF when reading the file B2.
>> Open the file B again in Excel and save as a file C2. Then, POI error
>> occurs at the string GHI when reading the file C2.
>> Open the file C again in Excel and save as a file D2. Then, POI error
>> occurs at the string JKL when reading the file D2.
>>
>> So, "Save As" creates a file which POI can't read at a different
>> string. Its pattern is reproducible, but I have not been able to
>> create new such file from scratch. My problematic Excel file contains
>> about 50 sheets, and each sheet has 300-1000 non-empty cells. Maybe
>> this issue occurs only for a large file.
>>
>> Only workaround is to use debugger to find strings that causes the
>> error, and re-input those strings in Excel and save. There are many
>> strings that cause errors, so I have to repeat debug and edit until
>> POI can successfully read the file. Another very interesting fact is
>> that, after I "clean" the problematic file until POI can read it, open
>> the cleaned file in Excel and do "Save As," then POI can't read the
>> newly saved file!
>>
>> Of course all the files I explained above can be opened in Excel, and
>> text and phonetic is not corrupted at all. (I checked phonetic with
>> =PHONETIC(cell) function.)
>>
>> apptaro
>>
>>
>> On Mon, Feb 14, 2011 at 9:04 PM, Nick Burch <ni...@alfresco.com> wrote:
>>> On Thu, 10 Feb 2011, Taro App wrote:
>>>>
>>>> Oops, there's a comment that isContinueNext "Should never be called
>>>> before end of current record" so the code must be correct.
>>>
>>> Or at least correct to the best of our knowledge...
>>>
>>>> I'm not sure how it happens, but the exception is raised when POI tries to
>>>> read phonetic text in ExtRst of UnicodeString. POI tries to read 18
>>>> double-byte characters, but RecordInputStream has only 29 bytes (14
>>>> characters + 1 byte.) If I check the Excel file with binary editor,
>>>> everything up to here seems correct.
>>>
>>> Hmm, the phonetic text stuff went in much later to the file format. It's not
>>> impossible that the team working on it had different ideas about continue
>>> records to the team who did the original work
>>>
>>>> When I open the excel file and deleted sheets which do not contain the
>>>> problematic string, then the problem disappeared. I tried deleting different
>>>> sheets, then the problem sometimes disappeared and sometimes not. I also
>>>> tried re-inputting the problematic string by cut & pasting in Excel, then
>>>> the problem always disappeared. My best guess is that Excel sometimes saves
>>>> corrupted data. Corruption occurs in phonetic texts which is not usually
>>>> visible to users, it is not very obvious.
>>>
>>> If you open the file in Excel and do "Save As", does it fix the issue, or is
>>> it only changing text / removing sheets that fixes it?
>>>
>>> Nick
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: RecordFormatException when reading a Excel file

Posted by Taro App <ap...@gmail.com>.

Okay, I understand what was going on now. Consulting to
OpenOffice.org's documentation of Excel file format
(http://sc.openoffice.org/excelfileformat.pdf) chapter 2.5.3 Unicode
Strings (BIFF8), I checked the data carefully with a binary editor. I
confirmed that CONTINUE(003Ch) records can appear right in between two
bytes that construct a single unicode character. Here is an example:

----------------------------------------------------------------------
Binary Data:
0B 00 05 54 00 00 00 AC 4E FD 90 02 5E F3 53 AC
4E 3A 53 AC 4E 17 53 64 8D F3 77 3A 75 01 00 50
00 09 00 35 00 05 00 14 00 14 00 AD 30 E7 30 A6
30 C8 30 B7 30 A6 30 AD 30 E7 30 A6 3C 00 1E 20
30 AF 30 B1 30 A4 30 DB 30 AF 30 A2 30 AB 30 A4
30 B7 30 DE 30 C1 30 00 00 00 00 03 00 05 00 03
00 03 00 0A 00 06 00 02 00 0E 00 08 00 02 00 12
00 0A 00 01 00

UnicodeString:
0B 00 -> 11 characters
05 -> not compressed, extended
54 00 00 00 -> length of ExtRst is 0x54 bytes
AC 4E FD 90 02 5E F3 53 AC 4E 3A 53 AC 4E 17 53
64 8D F3 77 3A 75 -> string data (11 characters)

ExtRst:
01 00 -> reserved = 1
50 00 -> length of phonetic string is 0x50 bytes
09 00 35 00 -> formatting infomation
05 00 -> number of phonetic text runs is 5
14 00 -> number of characters is 20
14 00 -> number of characters is 20
AD 30 E7 30 A6 30 C8 30 B7 30 A6 30 AD 30 E7 30
A6 [3C 00 1E 20] 30 AF 30 B1 30 A4 30 DB 30 AF 30
A2 30 AB 30 A4 30 B7 30 DE 30 C1 30
-> phonetic data (20 characters)
* 0x30A6 is a unicode character, but a CONTINUE record is in the middle
00 00 00 00 03 00 -> portion 1
05 00 03 00 03 00 -> portion 2
0A 00 06 00 02 00 -> portion 3
0E 00 08 00 02 00 -> portion 4
12 00 0A 00 01 00 -> portion 5
----------------------------------------------------------------------

I also found that CONTINUE records can appear in the middle of portion
data. Here is an example:

----------------------------------------------------------------------
Binary Data:
09 00 05 36 00 00 00 F3 77 DD 5D 0C 77 7D 76 71
5C 02 5E 0A 4E CE 91 3A 75 01 00 32 00 09 00 35
00 03 00 0B 00 0B 00 A4 30 B7 30 AB 30 EF 30 B1
30 F3 30 CF 30 AF 30 B5 30 F3 30 B7 30 00 00 00
00 03 00 06 00 03 00 02 3C 00 20 20 00 0A 00 05
00 01 00

UnicodeString:
09 00 -> 9 characters
05 -> not compressed, extended
36 00 00 00 -> length of ExtRst is 0x36 bytes
F3 77 DD 5D 0C 77 7D 76 71 5C 02 5E 0A 4E CE 91
3A 75 -> string data (9 characters)

ExtRst:
01 00 -> reserved = 1
32 00 -> length of phonetic string is 0x32 bytes
09 00 35 00 -> formatting infomation
03 00 -> number of phonetic text runs is 3
0B 00 -> number of characters is 11
0B 00 -> number of characters is 11
A4 30 B7 30 AB 30 EF 30 B1 30 F3 30 CF 30 AF 30
B5 30 F3 30 B7 30 -> phonetic data (11 characters)
00 00 00 00 03 00 -> portion 1
06 00 03 00 02 {3C 00 20 20} 00 -> portion 2
* a CONTINUE record is in the middle
0A 00 05 00 01 00 -> portion 3
----------------------------------------------------------------------

Excel files in the above patterns are not readable by POI 3.7.
Now I know the cause of the issue, so maybe I can create a Excel file
to duplicate the POI error.
But I don't think I can fix POI myself. I hope someone can help with that. Nick?

apptaro



On Tue, Feb 15, 2011 at 12:21 PM, Taro App <ap...@gmail.com> wrote:
> Hi Nick,
>
> I tested if "Save As" fixes the issue, and found out very interesting results.
> For example, there is an Excel file A, which POI can't read at a string ABC.
> Open the file A in Excel and save as a file B. Then, POI error occurs
> at a different string DEF when reading the file B.
> Open the file B in Excel and save as a file C. Then, POI error occurs
> at a different string GHI when reading the file C.
> Open the file C in Excel and save as a file D. Then, POI error occurs
> at a different string JKL when reading the file D.
> Open the file A again in Excel and save as a file B2. Then, POI error
> occurs at the string DEF when reading the file B2.
> Open the file B again in Excel and save as a file C2. Then, POI error
> occurs at the string GHI when reading the file C2.
> Open the file C again in Excel and save as a file D2. Then, POI error
> occurs at the string JKL when reading the file D2.
>
> So, "Save As" creates a file which POI can't read at a different
> string. Its pattern is reproducible, but I have not been able to
> create new such file from scratch. My problematic Excel file contains
> about 50 sheets, and each sheet has 300-1000 non-empty cells. Maybe
> this issue occurs only for a large file.
>
> Only workaround is to use debugger to find strings that causes the
> error, and re-input those strings in Excel and save. There are many
> strings that cause errors, so I have to repeat debug and edit until
> POI can successfully read the file. Another very interesting fact is
> that, after I "clean" the problematic file until POI can read it, open
> the cleaned file in Excel and do "Save As," then POI can't read the
> newly saved file!
>
> Of course all the files I explained above can be opened in Excel, and
> text and phonetic is not corrupted at all. (I checked phonetic with
> =PHONETIC(cell) function.)
>
> apptaro
>
>
> On Mon, Feb 14, 2011 at 9:04 PM, Nick Burch <ni...@alfresco.com> wrote:
>> On Thu, 10 Feb 2011, Taro App wrote:
>>>
>>> Oops, there's a comment that isContinueNext "Should never be called
>>> before end of current record" so the code must be correct.
>>
>> Or at least correct to the best of our knowledge...
>>
>>> I'm not sure how it happens, but the exception is raised when POI tries to
>>> read phonetic text in ExtRst of UnicodeString. POI tries to read 18
>>> double-byte characters, but RecordInputStream has only 29 bytes (14
>>> characters + 1 byte.) If I check the Excel file with binary editor,
>>> everything up to here seems correct.
>>
>> Hmm, the phonetic text stuff went in much later to the file format. It's not
>> impossible that the team working on it had different ideas about continue
>> records to the team who did the original work
>>
>>> When I open the excel file and deleted sheets which do not contain the
>>> problematic string, then the problem disappeared. I tried deleting different
>>> sheets, then the problem sometimes disappeared and sometimes not. I also
>>> tried re-inputting the problematic string by cut & pasting in Excel, then
>>> the problem always disappeared. My best guess is that Excel sometimes saves
>>> corrupted data. Corruption occurs in phonetic texts which is not usually
>>> visible to users, it is not very obvious.
>>
>> If you open the file in Excel and do "Save As", does it fix the issue, or is
>> it only changing text / removing sheets that fixes it?
>>
>> Nick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: RecordFormatException when reading a Excel file

Posted by Taro App <ap...@gmail.com>.

Hi Nick,

I tested if "Save As" fixes the issue, and found out very interesting results.
For example, there is an Excel file A, which POI can't read at a string ABC.
Open the file A in Excel and save as a file B. Then, POI error occurs
at a different string DEF when reading the file B.
Open the file B in Excel and save as a file C. Then, POI error occurs
at a different string GHI when reading the file C.
Open the file C in Excel and save as a file D. Then, POI error occurs
at a different string JKL when reading the file D.
Open the file A again in Excel and save as a file B2. Then, POI error
occurs at the string DEF when reading the file B2.
Open the file B again in Excel and save as a file C2. Then, POI error
occurs at the string GHI when reading the file C2.
Open the file C again in Excel and save as a file D2. Then, POI error
occurs at the string JKL when reading the file D2.

So, "Save As" creates a file which POI can't read at a different
string. Its pattern is reproducible, but I have not been able to
create new such file from scratch. My problematic Excel file contains
about 50 sheets, and each sheet has 300-1000 non-empty cells. Maybe
this issue occurs only for a large file.

Only workaround is to use debugger to find strings that causes the
error, and re-input those strings in Excel and save. There are many
strings that cause errors, so I have to repeat debug and edit until
POI can successfully read the file. Another very interesting fact is
that, after I "clean" the problematic file until POI can read it, open
the cleaned file in Excel and do "Save As," then POI can't read the
newly saved file!

Of course all the files I explained above can be opened in Excel, and
text and phonetic is not corrupted at all. (I checked phonetic with
=PHONETIC(cell) function.)

apptaro

On Mon, Feb 14, 2011 at 9:04 PM, Nick Burch <ni...@alfresco.com> wrote:
> On Thu, 10 Feb 2011, Taro App wrote:
>>
>> Oops, there's a comment that isContinueNext "Should never be called
>> before end of current record" so the code must be correct.
>
> Or at least correct to the best of our knowledge...
>
>> I'm not sure how it happens, but the exception is raised when POI tries to
>> read phonetic text in ExtRst of UnicodeString. POI tries to read 18
>> double-byte characters, but RecordInputStream has only 29 bytes (14
>> characters + 1 byte.) If I check the Excel file with binary editor,
>> everything up to here seems correct.
>
> Hmm, the phonetic text stuff went in much later to the file format. It's not
> impossible that the team working on it had different ideas about continue
> records to the team who did the original work
>
>> When I open the excel file and deleted sheets which do not contain the
>> problematic string, then the problem disappeared. I tried deleting different
>> sheets, then the problem sometimes disappeared and sometimes not. I also
>> tried re-inputting the problematic string by cut & pasting in Excel, then
>> the problem always disappeared. My best guess is that Excel sometimes saves
>> corrupted data. Corruption occurs in phonetic texts which is not usually
>> visible to users, it is not very obvious.
>
> If you open the file in Excel and do "Save As", does it fix the issue, or is
> it only changing text / removing sheets that fixes it?
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: RecordFormatException when reading a Excel file

Posted by Nick Burch <ni...@alfresco.com>.

On Thu, 10 Feb 2011, Taro App wrote:
> Oops, there's a comment that isContinueNext "Should never be called
> before end of current record" so the code must be correct.

Or at least correct to the best of our knowledge...

> I'm not sure how it happens, but the exception is raised when POI tries 
> to read phonetic text in ExtRst of UnicodeString. POI tries to read 18 
> double-byte characters, but RecordInputStream has only 29 bytes (14 
> characters + 1 byte.) If I check the Excel file with binary editor, 
> everything up to here seems correct.

Hmm, the phonetic text stuff went in much later to the file format. It's 
not impossible that the team working on it had different ideas about 
continue records to the team who did the original work

> When I open the excel file and deleted sheets which do not contain the 
> problematic string, then the problem disappeared. I tried deleting 
> different sheets, then the problem sometimes disappeared and sometimes 
> not. I also tried re-inputting the problematic string by cut & pasting 
> in Excel, then the problem always disappeared. My best guess is that 
> Excel sometimes saves corrupted data. Corruption occurs in phonetic 
> texts which is not usually visible to users, it is not very obvious.

If you open the file in Excel and do "Save As", does it fix the issue, or 
is it only changing text / removing sheets that fixes it?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: RecordFormatException when reading a Excel file

Posted by Taro App <ap...@gmail.com>.

Oops, there's a comment that isContinueNext "Should never be called
before end of current record" so the code must be correct.

I'm not sure how it happens, but the exception is raised when POI
tries to read phonetic text in ExtRst of UnicodeString. POI tries to
read 18 double-byte characters, but RecordInputStream has only 29
bytes (14 characters + 1 byte.) If I check the Excel file with binary
editor, everything up to here seems correct.

When I open the excel file and deleted sheets which do not contain the
problematic string, then the problem disappeared. I tried deleting
different sheets, then the problem sometimes disappeared and sometimes
not. I also tried re-inputting the problematic string by cut & pasting
in Excel, then the problem always disappeared. My best guess is that
Excel sometimes saves corrupted data. Corruption occurs in phonetic
texts which is not usually visible to users, it is not very obvious.

So, my workaround is to use debugger to find strings that causes the
issue, then I re-input the string in Excel and save. Then, POI can
read the file without any errors.

apptaro



On Thu, Feb 10, 2011 at 3:21 PM, Taro App <ap...@gmail.com> wrote:
> Hi Nick, thank you for your advice.
>
> I debugged into SSTRecord, and found something weird in
> org.apache.poi.hssf.record.RecordInputStream.java where raises the
> exception:
>
>  private void checkRecordPosition(int requiredByteCount) {
>
>    int nAvailable = remaining();
>    if (nAvailable >= requiredByteCount) {
>      // all OK
>      return;
>    }
>    if (nAvailable == 0 && isContinueNext()) {
>      nextRecord();
>      return;
>    }
>    throw new RecordFormatException("Not enough data (" + nAvailable
>        + ") to read requested (" + requiredByteCount +") bytes");
>  }
>
> The above code seems to raise an exception even if requiredByteCount
> is available if it reads next record. When two bytes are required, and
> only one byte is left, even if there is a next record, it raises an
> exception. I'm new to POI, so I'd appreciate if you could confirm if
> this is a bug.
>
> There are three bugs related to "checkRecordPosition", but none of
> them seems related to this, because in the three bugs, errors say "Not
> enough data (0)" instead of "Not enough data (1)"
> - Bug 49219 - Not enough data (0) to read requested (1) bytes error on
> Excel read
> - Bug 49677 - About sheet.getDefaultColumnWidth() serious error!
> - Bug 47247 - Initialisation of record 0x850 left 3060 bytes remaining
> still to be read.
>
> Please advise.
>
> apptaro
>
>
>
> On Wed, Feb 9, 2011 at 10:05 PM, Nick Burch <ni...@alfresco.com> wrote:
>> On Wed, 9 Feb 2011, Taro App wrote:
>>>
>>> Caused by: org.apache.poi.hssf.record.RecordFormatException: Not
>>> enough data (1) to read requested (2) bytes
>>>        at
>>> org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:216)
>>>        at
>>> org.apache.poi.hssf.record.RecordInputStream.readUShort(RecordInputStream.java:267)
>>>        at
>>> org.apache.poi.util.StringUtil.readUnicodeLE(StringUtil.java:277)
>>>        at
>>> org.apache.poi.hssf.record.common.UnicodeString$ExtRst.<init>(UnicodeString.java:172)
>>>        at
>>> org.apache.poi.hssf.record.common.UnicodeString.<init>(UnicodeString.java:438)
>>>        at
>>> org.apache.poi.hssf.record.SSTDeserializer.manufactureStrings(SSTDeserializer.java:55)
>>>        at org.apache.poi.hssf.record.SSTRecord.<init>(SSTRecord.java:250)
>>
>> This is the key bit of the stack trace. You have a SST Record which contains
>> a string that claims to be a 1 character unicode string (2 bytes). However,
>> the record only had 1 byte left, which is never a valid size for a unicode
>> string (they must have even lengths)
>>
>> I've no idea if excel has written something invalid, or if POI is confused
>> about the string being unicode vs ascii.
>>
>>> The file is saved with Excel 2003. It's a confidential file, so I
>>> can't provide the file here.
>>
>> I'd suggest you use a debugger to step into the SSTRecord code, and see if
>> you can spot what's going wrong. Is this the only string in the record? If
>> not, do the strings before it make sense or is there garbage in them?
>>
>> Nick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: RecordFormatException when reading a Excel file

Posted by Taro App <ap...@gmail.com>.

Hi Nick, thank you for your advice.

I debugged into SSTRecord, and found something weird in
org.apache.poi.hssf.record.RecordInputStream.java where raises the
exception:

  private void checkRecordPosition(int requiredByteCount) {

    int nAvailable = remaining();
    if (nAvailable >= requiredByteCount) {
      // all OK
      return;
    }
    if (nAvailable == 0 && isContinueNext()) {
      nextRecord();
      return;
    }
    throw new RecordFormatException("Not enough data (" + nAvailable
        + ") to read requested (" + requiredByteCount +") bytes");
  }

The above code seems to raise an exception even if requiredByteCount
is available if it reads next record. When two bytes are required, and
only one byte is left, even if there is a next record, it raises an
exception. I'm new to POI, so I'd appreciate if you could confirm if
this is a bug.

There are three bugs related to "checkRecordPosition", but none of
them seems related to this, because in the three bugs, errors say "Not
enough data (0)" instead of "Not enough data (1)"
- Bug 49219 - Not enough data (0) to read requested (1) bytes error on
Excel read
- Bug 49677 - About sheet.getDefaultColumnWidth() serious error!
- Bug 47247 - Initialisation of record 0x850 left 3060 bytes remaining
still to be read.

Please advise.

apptaro



On Wed, Feb 9, 2011 at 10:05 PM, Nick Burch <ni...@alfresco.com> wrote:
> On Wed, 9 Feb 2011, Taro App wrote:
>>
>> Caused by: org.apache.poi.hssf.record.RecordFormatException: Not
>> enough data (1) to read requested (2) bytes
>>        at
>> org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:216)
>>        at
>> org.apache.poi.hssf.record.RecordInputStream.readUShort(RecordInputStream.java:267)
>>        at
>> org.apache.poi.util.StringUtil.readUnicodeLE(StringUtil.java:277)
>>        at
>> org.apache.poi.hssf.record.common.UnicodeString$ExtRst.<init>(UnicodeString.java:172)
>>        at
>> org.apache.poi.hssf.record.common.UnicodeString.<init>(UnicodeString.java:438)
>>        at
>> org.apache.poi.hssf.record.SSTDeserializer.manufactureStrings(SSTDeserializer.java:55)
>>        at org.apache.poi.hssf.record.SSTRecord.<init>(SSTRecord.java:250)
>
> This is the key bit of the stack trace. You have a SST Record which contains
> a string that claims to be a 1 character unicode string (2 bytes). However,
> the record only had 1 byte left, which is never a valid size for a unicode
> string (they must have even lengths)
>
> I've no idea if excel has written something invalid, or if POI is confused
> about the string being unicode vs ascii.
>
>> The file is saved with Excel 2003. It's a confidential file, so I
>> can't provide the file here.
>
> I'd suggest you use a debugger to step into the SSTRecord code, and see if
> you can spot what's going wrong. Is this the only string in the record? If
> not, do the strings before it make sense or is there garbage in them?
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: RecordFormatException when reading a Excel file

Posted by Nick Burch <ni...@alfresco.com>.

On Wed, 9 Feb 2011, Taro App wrote:
> Caused by: org.apache.poi.hssf.record.RecordFormatException: Not
> enough data (1) to read requested (2) bytes
> 	at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:216)
> 	at org.apache.poi.hssf.record.RecordInputStream.readUShort(RecordInputStream.java:267)
> 	at org.apache.poi.util.StringUtil.readUnicodeLE(StringUtil.java:277)
> 	at org.apache.poi.hssf.record.common.UnicodeString$ExtRst.<init>(UnicodeString.java:172)
> 	at org.apache.poi.hssf.record.common.UnicodeString.<init>(UnicodeString.java:438)
> 	at org.apache.poi.hssf.record.SSTDeserializer.manufactureStrings(SSTDeserializer.java:55)
> 	at org.apache.poi.hssf.record.SSTRecord.<init>(SSTRecord.java:250)

This is the key bit of the stack trace. You have a SST Record which 
contains a string that claims to be a 1 character unicode string (2 
bytes). However, the record only had 1 byte left, which is never a valid 
size for a unicode string (they must have even lengths)

I've no idea if excel has written something invalid, or if POI is confused 
about the string being unicode vs ascii.

> The file is saved with Excel 2003. It's a confidential file, so I
> can't provide the file here.

I'd suggest you use a debugger to step into the SSTRecord code, and see if 
you can spot what's going wrong. Is this the only string in the record? If 
not, do the strings before it make sense or is there garbage in them?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org