You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Chris Bamford <cb...@mimecast.com> on 2014/08/01 13:42:26 UTC

POI RecordFormatException when reading XLS file

Hi folks

I have recently received a number of xls files which generate the following exception when trying to extract text from them:

   org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes

My code is based on the HSSF eventusermodel API, I extend POIOLE2TextExtractor.

Can anyone shed any light?  I can provide a sample xls file privately.

Thanks


- Chris
           
Chris Bamford
m: +44 7860 405292
www.mimecast.com
 
Mimecast
CityPoint
One Ropemaker Street, London, EC2Y 9AW
+44 (0) 207 847 8700
             

         
        
Disclaimer

cbamford@mimecast.com sent at 2014-08-01 12:42:30 is confidential and may be legally privileged. It is intended solely for use by user@poi.apache.org and others authorized to receive it. If you are not user@poi.apache.org
you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly prohibited and may be unlawful.

Mimecast Ltd. is a company registered in England and Wales with the company number 4698693 VAT No. GB 123 4197 34
Registered Office: CityPoint, One Ropemaker Street, Moorgate, London, EC2Y 9AW

This email message has been scanned for viruses by Mimecast. Mimecast delivers a complete managed email solution from a single web based platform. For more information please visit www.mimecast.com

mcst2013



Re: POI RecordFormatException when reading XLS file

Posted by Chris Bamford <cb...@mimecast.com>.
Sorry Nick, forgot to attach the screen output which talks about a ‘FilePointer’ issue.

BFFValidator: "z:\Downloads\file.xls" FAILED at 08/01/14 16:58:33
Log at: z:\Downloads\ESUP-2034.xls.bffvalidator.08-01-14_16-58-33.xml
See: http://msdn.microsoft.com/en-us/library/dd904963(v=office.12).aspx for more information

What could be causing this?

Thanks,

- Chris

On 1 Aug 2014, at 15:36, Nick Burch <ap...@gagravarr.org>> wrote:

On Fri, 1 Aug 2014, Chris Bamford wrote:
> The record it is trying to create is
> org.apache.poi.hssf.record.ExtSSTRecord

That's normally quite a standard record

Can you try running the file format validator against it? See
http://poi.apache.org/faq.html#faq-N10152<http://poi.apache.org/faq.html#faq-N10152> for details

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/







Re: POI RecordFormatException when reading XLS file

Posted by Chris Bamford <cb...@mimecast.com>.
Nick

If you like I could create a Jira case and attach the file - would you prefer that?

Thanks

- Chris

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/

On 8 Sep 2014, at 12:28, Chris Bamford <cb...@mimecast.com>> wrote:


Hi Nick

Just picked this up again.  I have had a go with BiffViewer and it tells me that the offending record is the first instance of record type ExtSSTRecord, immediately after the SST table.
The SST table all dumps correctly and visually matches the data in Excel.
At this point I don’t know how to proceed as the exception is thrown (blow) and I don’t know how to interpret the values in the debugger.
Can you guide me?  I have permission to make the xls file public if that would help.  I could also send the output of BiffViewer and POIFSViewer ..

Exception message:


   org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes

Thanks,

- Chris

On 4 Aug 2014, at 10:24, Nick Burch <ap...@gagravarr.org>> wrote:

On Mon, 4 Aug 2014, Chris Bamford wrote:






Good idea.  I ran the validator on the els file and it fails.  Not sure I understand all the details, though!  Here’s the output:

<BFFValidation path="z:\Downloads\file.xls" datetime="08/01/14 16:58:33" result="FAILED">

This is the main bit - result=FAILED. Your file isn't a valid Excel file, based on the spec

If Excel loads the file, then it probably isn't that far off. With the file, we might be able to add a suitable workaround too. Without it, you'll need to do that, and submit a patch! Start by identifying the problem record (maybe with BiffViewer), then look at the hex dump of the stream (POIFSViewer) and compare that with the file format docs ([MS-XLS].pdf) to see where it differs

Nick



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>





Re: POI RecordFormatException when reading XLS file

Posted by Chris Bamford <cb...@mimecast.com>.
Hi Nick

Just picked this up again.  I have had a go with BiffViewer and it tells me that the offending record is the first instance of record type ExtSSTRecord, immediately after the SST table.
The SST table all dumps correctly and visually matches the data in Excel.
At this point I don’t know how to proceed as the exception is thrown (blow) and I don’t know how to interpret the values in the debugger.
Can you guide me?  I have permission to make the xls file public if that would help.  I could also send the output of BiffViewer and POIFSViewer ..

Exception message:


   org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes

Thanks,

- Chris

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/

On 4 Aug 2014, at 10:24, Nick Burch <ap...@gagravarr.org>> wrote:

On Mon, 4 Aug 2014, Chris Bamford wrote:
Good idea.  I ran the validator on the els file and it fails.  Not sure I understand all the details, though!  Here’s the output:

<BFFValidation path="z:\Downloads\file.xls" datetime="08/01/14 16:58:33" result="FAILED">

This is the main bit - result=FAILED. Your file isn't a valid Excel file, based on the spec

If Excel loads the file, then it probably isn't that far off. With the file, we might be able to add a suitable workaround too. Without it, you'll need to do that, and submit a patch! Start by identifying the problem record (maybe with BiffViewer), then look at the hex dump of the stream (POIFSViewer) and compare that with the file format docs ([MS-XLS].pdf) to see where it differs

Nick



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>





Re: POI RecordFormatException when reading XLS file

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 4 Aug 2014, Chris Bamford wrote:
> Good idea.  I ran the validator on the els file and it fails.  Not sure 
> I understand all the details, though!  Here’s the output:
>
> <BFFValidation path="z:\Downloads\file.xls" datetime="08/01/14 16:58:33" 
> result="FAILED">

This is the main bit - result=FAILED. Your file isn't a valid Excel file, 
based on the spec

If Excel loads the file, then it probably isn't that far off. With the 
file, we might be able to add a suitable workaround too. Without it, 
you'll need to do that, and submit a patch! Start by identifying the 
problem record (maybe with BiffViewer), then look at the hex dump of the 
stream (POIFSViewer) and compare that with the file format docs 
([MS-XLS].pdf) to see where it differs

Nick

Re: POI RecordFormatException when reading XLS file

Posted by Chris Bamford <cb...@mimecast.com>.
Hi Nick,

Good idea.  I ran the validator on the els file and it fails.  Not sure I understand all the details, though!  Here’s the output:


<BFFValidation path="z:\Downloads\file.xls" datetime="08/01/14 16:58:33" result="FAILED">
  <ParseStack>
    <Type builtinType="Docfile" docName="MS-XLS" sectionTitle="Compound File" msdnLink="http://msdn.microsoft.com/en-us/library/b91df1c9-6bfa-4ab4-8218-7bb0d73624ca">
      <Info>Built-in type "Docfile": The root storage object of an OLE compound file. For more information, see http://msdn.microsoft.com/en-us/library/dd942138.aspx.</Info>
    </Type>
    <Type builtinType="Stream" docName="MS-XLS" sectionTitle="Stream" msdnLink="http://msdn.microsoft.com/en-us/library/f67ac5ed-b0a7-4b2c-9b7a-28933eeaac7e" streamName="#SummaryInformation" streamOffset="0" hexStreamOffset="0x0">
      <Info>Built-in type "Stream": Any stream object for OLE compound files. The entire file contents for other files.</Info>
    </Type>
    <Type docName="MS-XLS" sectionTitle="Summary Information Stream (#SummaryInformation)" msdnLink="http://msdn.microsoft.com/en-us/library/d604544b-a580-44ad-99d6-ca20855a9036" streamName="#SummaryInformation" streamOffset="0" hexStreamOffset="0x0"/>
    <Type docName="MS-OSHARED" sectionTitle="FilePointer" msdnLink="http://msdn.microsoft.com/en-us/library/dd904963(v=office.12).aspx" streamName="#SummaryInformation" streamOffset="44" hexStreamOffset="0x2c"/>
    <Type docName="MS-OSHARED" sectionTitle="FilePointer" msdnLink="http://msdn.microsoft.com/en-us/library/dd904963(v=office.12).aspx" streamName="#SummaryInformation" streamOffset="68" hexStreamOffset="0x44"/>
  </ParseStack>
  <LastData><![CDATA[
]]></LastData>
</BFFValidation>

Thanks

- Chris

On 1 Aug 2014, at 15:36, Nick Burch <ap...@gagravarr.org>> wrote:

On Fri, 1 Aug 2014, Chris Bamford wrote:
> The record it is trying to create is
> org.apache.poi.hssf.record.ExtSSTRecord

That's normally quite a standard record

Can you try running the file format validator against it? See
http://poi.apache.org/faq.html#faq-N10152<http://poi.apache.org/faq.html#faq-N10152> for details

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/







Re: POI RecordFormatException when reading XLS file

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 1 Aug 2014, Chris Bamford wrote:
> The record it is trying to create is 
> org.apache.poi.hssf.record.ExtSSTRecord

That's normally quite a standard record

Can you try running the file format validator against it? See 
http://poi.apache.org/faq.html#faq-N10152 for details

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: POI RecordFormatException when reading XLS file

Posted by Chris Bamford <cb...@mimecast.com>.
Hi Nick,

The record it is trying to create is org.apache.poi.hssf.record.ExtSSTRecord

Does this help?

I will watch for the 3.11 beta 1, thanks

- Chris

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/

On 1 Aug 2014, at 13:36, Nick Burch <ap...@gagravarr.org>> wrote:

On Fri, 1 Aug 2014, Chris Bamford wrote:
I have recently received a number of xls files which generate the following exception when trying to extract text from them:

 org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes

The interesting thing will be to know what Record it was trying to create when it hit that. Should be further down in the stacktrace.

Also, make sure you're using the latest version of Apache POI (3.11 beta 1 should be out next week, all being well, see dev@ for the release candidate details)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>





Re: POI RecordFormatException when reading XLS file

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 1 Aug 2014, Chris Bamford wrote:
> I have recently received a number of xls files which generate the 
> following exception when trying to extract text from them:
>
>   org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) 
> to read requested (2) bytes

The interesting thing will be to know what Record it was trying to create 
when it hit that. Should be further down in the stacktrace.

Also, make sure you're using the latest version of Apache POI (3.11 beta 1 
should be out next week, all being well, see dev@ for the release 
candidate details)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org