You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Chris Bamford <cb...@mimecast.com> on 2014/08/01 13:42:26 UTC
POI RecordFormatException when reading XLS file
Hi folks
I have recently received a number of xls files which generate the following exception when trying to extract text from them:
org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes
My code is based on the HSSF eventusermodel API, I extend POIOLE2TextExtractor.
Can anyone shed any light? I can provide a sample xls file privately.
Thanks
- Chris
Chris Bamford
m: +44 7860 405292
www.mimecast.com
Mimecast
CityPoint
One Ropemaker Street, London, EC2Y 9AW
+44 (0) 207 847 8700
Disclaimer
cbamford@mimecast.com sent at 2014-08-01 12:42:30 is confidential and may be legally privileged. It is intended solely for use by user@poi.apache.org and others authorized to receive it. If you are not user@poi.apache.org
you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly prohibited and may be unlawful.
Mimecast Ltd. is a company registered in England and Wales with the company number 4698693 VAT No. GB 123 4197 34
Registered Office: CityPoint, One Ropemaker Street, Moorgate, London, EC2Y 9AW
This email message has been scanned for viruses by Mimecast. Mimecast delivers a complete managed email solution from a single web based platform. For more information please visit www.mimecast.com
mcst2013
Re: POI RecordFormatException when reading XLS file
Posted by Chris Bamford <cb...@mimecast.com>.
Sorry Nick, forgot to attach the screen output which talks about a ‘FilePointer’ issue.
BFFValidator: "z:\Downloads\file.xls" FAILED at 08/01/14 16:58:33
Log at: z:\Downloads\ESUP-2034.xls.bffvalidator.08-01-14_16-58-33.xml
See: http://msdn.microsoft.com/en-us/library/dd904963(v=office.12).aspx for more information
What could be causing this?
Thanks,
- Chris
On 1 Aug 2014, at 15:36, Nick Burch <ap...@gagravarr.org>> wrote:
On Fri, 1 Aug 2014, Chris Bamford wrote:
> The record it is trying to create is
> org.apache.poi.hssf.record.ExtSSTRecord
That's normally quite a standard record
Can you try running the file format validator against it? See
http://poi.apache.org/faq.html#faq-N10152<http://poi.apache.org/faq.html#faq-N10152> for details
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
Re: POI RecordFormatException when reading XLS file
Posted by Chris Bamford <cb...@mimecast.com>.
Nick
If you like I could create a Jira case and attach the file - would you prefer that?
Thanks
- Chris
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
On 8 Sep 2014, at 12:28, Chris Bamford <cb...@mimecast.com>> wrote:
Hi Nick
Just picked this up again. I have had a go with BiffViewer and it tells me that the offending record is the first instance of record type ExtSSTRecord, immediately after the SST table.
The SST table all dumps correctly and visually matches the data in Excel.
At this point I don’t know how to proceed as the exception is thrown (blow) and I don’t know how to interpret the values in the debugger.
Can you guide me? I have permission to make the xls file public if that would help. I could also send the output of BiffViewer and POIFSViewer ..
Exception message:
org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes
Thanks,
- Chris
On 4 Aug 2014, at 10:24, Nick Burch <ap...@gagravarr.org>> wrote:
On Mon, 4 Aug 2014, Chris Bamford wrote:
Good idea. I ran the validator on the els file and it fails. Not sure I understand all the details, though! Here’s the output:
<BFFValidation path="z:\Downloads\file.xls" datetime="08/01/14 16:58:33" result="FAILED">
This is the main bit - result=FAILED. Your file isn't a valid Excel file, based on the spec
If Excel loads the file, then it probably isn't that far off. With the file, we might be able to add a suitable workaround too. Without it, you'll need to do that, and submit a patch! Start by identifying the problem record (maybe with BiffViewer), then look at the hex dump of the stream (POIFSViewer) and compare that with the file format docs ([MS-XLS].pdf) to see where it differs
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>
Re: POI RecordFormatException when reading XLS file
Posted by Chris Bamford <cb...@mimecast.com>.
Hi Nick
Just picked this up again. I have had a go with BiffViewer and it tells me that the offending record is the first instance of record type ExtSSTRecord, immediately after the SST table.
The SST table all dumps correctly and visually matches the data in Excel.
At this point I don’t know how to proceed as the exception is thrown (blow) and I don’t know how to interpret the values in the debugger.
Can you guide me? I have permission to make the xls file public if that would help. I could also send the output of BiffViewer and POIFSViewer ..
Exception message:
org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes
Thanks,
- Chris
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
On 4 Aug 2014, at 10:24, Nick Burch <ap...@gagravarr.org>> wrote:
On Mon, 4 Aug 2014, Chris Bamford wrote:
Good idea. I ran the validator on the els file and it fails. Not sure I understand all the details, though! Here’s the output:
<BFFValidation path="z:\Downloads\file.xls" datetime="08/01/14 16:58:33" result="FAILED">
This is the main bit - result=FAILED. Your file isn't a valid Excel file, based on the spec
If Excel loads the file, then it probably isn't that far off. With the file, we might be able to add a suitable workaround too. Without it, you'll need to do that, and submit a patch! Start by identifying the problem record (maybe with BiffViewer), then look at the hex dump of the stream (POIFSViewer) and compare that with the file format docs ([MS-XLS].pdf) to see where it differs
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>
Re: POI RecordFormatException when reading XLS file
Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 4 Aug 2014, Chris Bamford wrote:
> Good idea. I ran the validator on the els file and it fails. Not sure
> I understand all the details, though! Here’s the output:
>
> <BFFValidation path="z:\Downloads\file.xls" datetime="08/01/14 16:58:33"
> result="FAILED">
This is the main bit - result=FAILED. Your file isn't a valid Excel file,
based on the spec
If Excel loads the file, then it probably isn't that far off. With the
file, we might be able to add a suitable workaround too. Without it,
you'll need to do that, and submit a patch! Start by identifying the
problem record (maybe with BiffViewer), then look at the hex dump of the
stream (POIFSViewer) and compare that with the file format docs
([MS-XLS].pdf) to see where it differs
Nick
Re: POI RecordFormatException when reading XLS file
Posted by Chris Bamford <cb...@mimecast.com>.
Hi Nick,
Good idea. I ran the validator on the els file and it fails. Not sure I understand all the details, though! Here’s the output:
<BFFValidation path="z:\Downloads\file.xls" datetime="08/01/14 16:58:33" result="FAILED">
<ParseStack>
<Type builtinType="Docfile" docName="MS-XLS" sectionTitle="Compound File" msdnLink="http://msdn.microsoft.com/en-us/library/b91df1c9-6bfa-4ab4-8218-7bb0d73624ca">
<Info>Built-in type "Docfile": The root storage object of an OLE compound file. For more information, see http://msdn.microsoft.com/en-us/library/dd942138.aspx.</Info>
</Type>
<Type builtinType="Stream" docName="MS-XLS" sectionTitle="Stream" msdnLink="http://msdn.microsoft.com/en-us/library/f67ac5ed-b0a7-4b2c-9b7a-28933eeaac7e" streamName="#SummaryInformation" streamOffset="0" hexStreamOffset="0x0">
<Info>Built-in type "Stream": Any stream object for OLE compound files. The entire file contents for other files.</Info>
</Type>
<Type docName="MS-XLS" sectionTitle="Summary Information Stream (#SummaryInformation)" msdnLink="http://msdn.microsoft.com/en-us/library/d604544b-a580-44ad-99d6-ca20855a9036" streamName="#SummaryInformation" streamOffset="0" hexStreamOffset="0x0"/>
<Type docName="MS-OSHARED" sectionTitle="FilePointer" msdnLink="http://msdn.microsoft.com/en-us/library/dd904963(v=office.12).aspx" streamName="#SummaryInformation" streamOffset="44" hexStreamOffset="0x2c"/>
<Type docName="MS-OSHARED" sectionTitle="FilePointer" msdnLink="http://msdn.microsoft.com/en-us/library/dd904963(v=office.12).aspx" streamName="#SummaryInformation" streamOffset="68" hexStreamOffset="0x44"/>
</ParseStack>
<LastData><![CDATA[
]]></LastData>
</BFFValidation>
Thanks
- Chris
On 1 Aug 2014, at 15:36, Nick Burch <ap...@gagravarr.org>> wrote:
On Fri, 1 Aug 2014, Chris Bamford wrote:
> The record it is trying to create is
> org.apache.poi.hssf.record.ExtSSTRecord
That's normally quite a standard record
Can you try running the file format validator against it? See
http://poi.apache.org/faq.html#faq-N10152<http://poi.apache.org/faq.html#faq-N10152> for details
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
Re: POI RecordFormatException when reading XLS file
Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 1 Aug 2014, Chris Bamford wrote:
> The record it is trying to create is
> org.apache.poi.hssf.record.ExtSSTRecord
That's normally quite a standard record
Can you try running the file format validator against it? See
http://poi.apache.org/faq.html#faq-N10152 for details
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: POI RecordFormatException when reading XLS file
Posted by Chris Bamford <cb...@mimecast.com>.
Hi Nick,
The record it is trying to create is org.apache.poi.hssf.record.ExtSSTRecord
Does this help?
I will watch for the 3.11 beta 1, thanks
- Chris
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
On 1 Aug 2014, at 13:36, Nick Burch <ap...@gagravarr.org>> wrote:
On Fri, 1 Aug 2014, Chris Bamford wrote:
I have recently received a number of xls files which generate the following exception when trying to extract text from them:
org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (2) bytes
The interesting thing will be to know what Record it was trying to create when it hit that. Should be further down in the stacktrace.
Also, make sure you're using the latest version of Apache POI (3.11 beta 1 should be out next week, all being well, see dev@ for the release candidate details)
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org<ma...@poi.apache.org>
For additional commands, e-mail: user-help@poi.apache.org<ma...@poi.apache.org>
Re: POI RecordFormatException when reading XLS file
Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 1 Aug 2014, Chris Bamford wrote:
> I have recently received a number of xls files which generate the
> following exception when trying to extract text from them:
>
> org.apache.poi.hssf.record.RecordFormatException: Not enough data (0)
> to read requested (2) bytes
The interesting thing will be to know what Record it was trying to create
when it hit that. Should be further down in the stacktrace.
Also, make sure you're using the latest version of Apache POI (3.11 beta 1
should be out next week, all being well, see dev@ for the release
candidate details)
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org