You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Rainer Schwarze <rs...@admadic.de> on 2007/06/11 21:25:39 UTC

Re: AW: Unable to read an excel file

Hello Nick,

I looked into the file of Sascha and noticed the following issues:

1) The file format doesn't seem to be a "perfect excel format" (see below)

2) Supporting such files in HSSF requires small changes to several
(maybe a lot) of the *Record classes fillFields method and identifying a
few situations where the file format is not fully understood. (not sure,
if more is needed)

The details:

The BOF record of the file has the ID 0x0809 indicating BIFF8, however
the length of the BOF-record is 6 which matches the specs for IDs 0x0209
or 0x0409 which in turn would mean BIFF2..BIFF4.

So the primary reason for HSSF to fail on that file is that HSSF expects
proper record length, but most records in the file (as far as I looked)
are too short. Which leads to the typical changes of the code: For
instance while trying to track down the problem, I changed
BOFRecord.fillFields

from:

    protected void fillFields(RecordInputStream in)
    {
        field_1_version  = in.readShort();
        field_2_type     = in.readShort();
        field_3_build    = in.readShort();
        field_4_year     = in.readShort();
        field_5_history  = in.readInt();
        field_6_rversion = in.readInt();
    }

to:

    protected void fillFields(RecordInputStream in)
    {
        field_1_version  = in.readShort();
        field_2_type     = in.readShort();
        if (in.getRecordOffset()<in.getLength())
           field_3_build    = in.readShort();
        if (in.getRecordOffset()<in.getLength())
           field_4_year     = in.readShort();
        if (in.getRecordOffset()<in.getLength())
           field_5_history  = in.readInt();
        if (in.getRecordOffset()<in.getLength())
           field_6_rversion = in.readInt();
    }

That worked somewhat for BOF, DIMENSIONS and HEADER. However, for the
FOOTER record the changes are a bit more complicated, because there
seems to be no unicode flag in the file. The code contains:

            field_2_reserved = in.readByte();
            field_3_unicode_flag = in.readByte();				

These two fields do not exist in the excel file and my assumption is,
that the unicode_flag is only there, if the reserved field is zero. If
it is non-zero, it is already the first character of the string.
At that point I stopped digging deeper in order to discuss a few things
before potentially wasting time.

Regarding the file itself: Would it be an option to submit a bug and
attach the file to it?

What do you (or others) think about this? Could there be a solution just
around the corner which I don't see? (Maybe someone is already working
on that kind of situation?)

Best wishes,
Rainer

Sascha Schäfer wrote:
> Hello again,
> 
> I saw, that the attachment was deleted. Maybe I can send this file to your
> personal email because I have no webspace. If not, I would look for a forum,
> which allowed me to post my file.
> 
> Thanks,
> Sascha
[...]
> -----Ursprüngliche Nachricht-----
> Von: Nick Burch [mailto:nick@torchbox.com] 
> Gesendet: Montag, 4. Juni 2007 12:34
> An: POI Users List
> Betreff: Re: AW: Unable to read an excel file
> 
> On Fri, 1 Jun 2007, Sascha Schäfer wrote:
>> I also tried to save this file with the save as option. I selected the 
>> actual excel-format, which was also preselected. After that, I was able 
>> to read this file with jakarata poi.
> 
> If you post the file somewhere, I'll take a look and see if I can spot 
> anything obviously wrong with it.
> 
> Nick

-- 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org