You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Zoran Avtarovski <zo...@sparecreative.com> on 2009/12/02 06:33:49 UTC

Determine version of excel file

We¹re currently successfully using poi in a webapp to read from uploaded
excel files.

At present we do a check on the filename extension. If xls we use HSSF and
if xslx we use XSSF and this is working reasonably well.

The problem we have is that some of our users are uploading files with no
extension. Is there a way to determine the type of excel file that we are
dealing with programmatically?

I¹d appreciate any pointers as I haven¹t had a lot of luck finding anything
on the web.

Z.

Re: Determine version of excel file

Posted by MSB <ma...@tiscali.co.uk>.
Thanks for that Chris. I realised what I had said just as we pulled onto site
this morning and have spent hours today calling myself all of the names
under the sun and bewailing (good word) the fact that I could not get to a
PC!! Of course you are correct, the xml is zipped so there will be no xml
header at the start of the file. That will teach me to show off before I
have had two cups of tea in the morning; my apologies to all.

Yours

Mark B


ChrisLott wrote:
> 
> MSB wrote:
>> ..
>> Finally, you could open an InputStream onto the file and examine the
>> first
>> few bytes - I think it is safe to assume that the xml header would be the
>> first thing you read from an OpenXML based file.
> 
> Goodness, no!  :-)  An xlsx file (like a docx file and I suppose a pptx 
> file) is actually a zip archive.  Try opening it with winzip or your 
> favorite zip-file reader and you'll see ("zip -T sheet.xlsx").  Inside 
> you'll see XML files, each of which should have a nice XML header.  I 
> suppose you could reimplement the magic-number check for a zip file done 
> by a unix/linux machine's "file" program, maybe that's what POI's 
> WorkbookFactory does under the covers.
> 
> chris...
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Determine-version-of-excel-file-tp26603831p26611823.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Determine version of excel file

Posted by Chris Lott <ma...@invest-faq.com>.
MSB wrote:
> ..
> Finally, you could open an InputStream onto the file and examine the first
> few bytes - I think it is safe to assume that the xml header would be the
> first thing you read from an OpenXML based file.

Goodness, no!  :-)  An xlsx file (like a docx file and I suppose a pptx 
file) is actually a zip archive.  Try opening it with winzip or your 
favorite zip-file reader and you'll see ("zip -T sheet.xlsx").  Inside 
you'll see XML files, each of which should have a nice XML header.  I 
suppose you could reimplement the magic-number check for a zip file done 
by a unix/linux machine's "file" program, maybe that's what POI's 
WorkbookFactory does under the covers.

chris...

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Determine version of excel file

Posted by MSB <ma...@tiscali.co.uk>.
Hello again,

If you code to the 'SS' model then you will not need to check the type
returned by the WorkbookFactory at all. By this. I mean that you will use
objects from the org.apache.poi.ss.usermodel package such as Workbook, Sheet
and Cell in your program. That way, you need have no concerns at all about
the actual type - and therefore the file format - that you are dealing with.
Before deciding to move your code in this direction however, it would be
wise to look closely at the methods defined on the various interfaces just
to make sure that you can accomplish everything you require. Just as am
example, on HSSFSheet you can add a data validation whuilst this methgod has
not yet been declared within the Sheet interface. Should you need to
extended functionality then of course you can just test the type returned to
you by the WorkbookFactory and then 'direct' program flow accordingly.

Yours

Mark B

PS Sorry again for that 'just open the file and look at the header' bit this
morning. I was over-excited about the work we had on today and seriously
deprived of tea at that moment. Hope you did not waste any time pursuing
that fruitless/pointless direction.


Sparecreative wrote:
> 
> Thanks Mark,
> 
> That’s exactly what I was after. And, just so I’m clear, with the Workbook
> factory, I just check if the Workbook is an instance of either HSSF or
> XSSF
> and process accordingly.
> 
> On a related topic it would be great if the specifics for the two event
> based readers (for HSSF and XSSF) could be abstracted away so the we could
> use one set of methods regardless of the document type. Much like the user
> model.
> 
> Z.
>> 
>> So, to be clear, all you want to do is identify which files use the
>> OpenXML
>> file format and which are binary?
>> 
>> If so, then take a look at the
>> org.apache.poi.ss.usermodel.WorkbookFactory
>> (http://poi.eu.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.
>> html)
>> class. All you need to do is call the static create method pasing an
>> InputStream and you will receive back an instance of either the HSSF or
>> XSSFWorkbook class depending upon the type of the file. Myself, I have
>> never
>> tried using it with files that lack extensions but it ought to work and I
>> would certainly suggest giving it a try. Alternatively, you can simply
>> catch
>> exceptions; i.e. try to open the file as an HSSFWorkbook, catch the
>> exception if the format is not correct and try to open it as an
>> XSSFWorkbook
>> then catch and handle the exception thrown if the format is again
>> invalid.
>> Finally, you could open an InputStream onto the file and examine the
>> first
>> few bytes - I think it is safe to assume that the xml header would be the
>> first thing you read from an OpenXML based file.
>> 
>> Yours
>> 
>> Mark B
>> 
>> 
>> Sparecreative wrote:
>>> > 
>>> > We¹re currently successfully using poi in a webapp to read from
>>> uploaded
>>> > excel files.
>>> > 
>>> > At present we do a check on the filename extension. If xls we use HSSF
>>> and
>>> > if xslx we use XSSF and this is working reasonably well.
>>> > 
>>> > The problem we have is that some of our users are uploading files with
>>> no
>>> > extension. Is there a way to determine the type of excel file that we
>>> are
>>> > dealing with programmatically?
>>> > 
>>> > I¹d appreciate any pointers as I haven¹t had a lot of luck finding
>>> > anything
>>> > on the web.
>>> > 
>>> > Z.
>>> > 
>>> > 
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Determine-version-of-excel-file-tp26603831p26611959.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Determine version of excel file

Posted by Zoran Avtarovski <zo...@sparecreative.com>.
Thanks Mark,

That’s exactly what I was after. And, just so I’m clear, with the Workbook
factory, I just check if the Workbook is an instance of either HSSF or XSSF
and process accordingly.

On a related topic it would be great if the specifics for the two event
based readers (for HSSF and XSSF) could be abstracted away so the we could
use one set of methods regardless of the document type. Much like the user
model.

Z.
> 
> So, to be clear, all you want to do is identify which files use the OpenXML
> file format and which are binary?
> 
> If so, then take a look at the org.apache.poi.ss.usermodel.WorkbookFactory
> (http://poi.eu.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.
> html)
> class. All you need to do is call the static create method pasing an
> InputStream and you will receive back an instance of either the HSSF or
> XSSFWorkbook class depending upon the type of the file. Myself, I have never
> tried using it with files that lack extensions but it ought to work and I
> would certainly suggest giving it a try. Alternatively, you can simply catch
> exceptions; i.e. try to open the file as an HSSFWorkbook, catch the
> exception if the format is not correct and try to open it as an XSSFWorkbook
> then catch and handle the exception thrown if the format is again invalid.
> Finally, you could open an InputStream onto the file and examine the first
> few bytes - I think it is safe to assume that the xml header would be the
> first thing you read from an OpenXML based file.
> 
> Yours
> 
> Mark B
> 
> 
> Sparecreative wrote:
>> > 
>> > We¹re currently successfully using poi in a webapp to read from uploaded
>> > excel files.
>> > 
>> > At present we do a check on the filename extension. If xls we use HSSF and
>> > if xslx we use XSSF and this is working reasonably well.
>> > 
>> > The problem we have is that some of our users are uploading files with no
>> > extension. Is there a way to determine the type of excel file that we are
>> > dealing with programmatically?
>> > 
>> > I¹d appreciate any pointers as I haven¹t had a lot of luck finding
>> > anything
>> > on the web.
>> > 
>> > Z.
>> > 
>> > 


Re: Determine version of excel file

Posted by MSB <ma...@tiscali.co.uk>.
So, to be clear, all you want to do is identify which files use the OpenXML
file format and which are binary?

If so, then take a look at the org.apache.poi.ss.usermodel.WorkbookFactory
(http://poi.eu.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html)
class. All you need to do is call the static create method pasing an
InputStream and you will receive back an instance of either the HSSF or
XSSFWorkbook class depending upon the type of the file. Myself, I have never
tried using it with files that lack extensions but it ought to work and I
would certainly suggest giving it a try. Alternatively, you can simply catch
exceptions; i.e. try to open the file as an HSSFWorkbook, catch the
exception if the format is not correct and try to open it as an XSSFWorkbook
then catch and handle the exception thrown if the format is again invalid.
Finally, you could open an InputStream onto the file and examine the first
few bytes - I think it is safe to assume that the xml header would be the
first thing you read from an OpenXML based file.

Yours

Mark B


Sparecreative wrote:
> 
> We¹re currently successfully using poi in a webapp to read from uploaded
> excel files.
> 
> At present we do a check on the filename extension. If xls we use HSSF and
> if xslx we use XSSF and this is working reasonably well.
> 
> The problem we have is that some of our users are uploading files with no
> extension. Is there a way to determine the type of excel file that we are
> dealing with programmatically?
> 
> I¹d appreciate any pointers as I haven¹t had a lot of luck finding
> anything
> on the web.
> 
> Z.
> 
> 

-- 
View this message in context: http://old.nabble.com/Determine-version-of-excel-file-tp26603831p26604556.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org