You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Eric Peters <Er...@Peters.org> on 2013/07/15 18:01:27 UTC

Sniffing File Type from InpuStream?

If you have an InputStream, what's a good best practice for determining
whether it's a HSSF vs XSSF file?  I'm assuming by somehow wrapping the
InputStream in a BufferedInputStream and marking/resetting before actually
doing the processing.

But other than just looking at file extension, anyone have an idea how I
can unify my Excel File Reader to just properly handle the file type.

Thanks,

Eric

Re: Sniffing File Type from InpuStream?

Posted by Darren Roberts <ro...@yahoo.com>.
You could try opening the InputStream using ZipInputStream - if it fails or you trap an exception then you know it's probably not an xlsx (or at least a corrupt one). If it does open then you could also be more through and check for the presence of certain xml files and/or folders in the zip to confirm that it is an xlsx and not some other OpenDocument or OpenXML (docx/pptx) file. 

Not sure how to confirm xls, presumably the format has some identifier bytes in the header you could look for? Or simply try opening the file with the HSSF usermodel if the above has failed/dropped through (may be prohibitively expensive time/memory wise depending on what it is you're actually doing).

Probably not a lot of help... 




>________________________________
> From: Eric Peters <Er...@Peters.org>
>To: user@poi.apache.org 
>Sent: Monday, July 15, 2013 5:01 PM
>Subject: Sniffing File Type from InpuStream?
> 
>
>If you have an InputStream, what's a good best practice for determining
>whether it's a HSSF vs XSSF file?  I'm assuming by somehow wrapping the
>InputStream in a BufferedInputStream and marking/resetting before actually
>doing the processing.
>
>But other than just looking at file extension, anyone have an idea how I
>can unify my Excel File Reader to just properly handle the file type.
>
>Thanks,
>
>Eric
>
>
>

Re: Sniffing File Type from InpuStream?

Posted by Eric Peters <Er...@Peters.org>.
Oh snap that source has exactly what I need.  Thanks!

Eric


On Mon, Jul 15, 2013 at 2:32 PM, Nick Burch <ap...@gagravarr.org> wrote:

> On Mon, 15 Jul 2013, Eric Peters wrote:
>
>> Should have clarified maybe, I'm using the event model for both classes
>> since I'm processing some fairly large files, is there a WorkbookFactory
>> equivalent for the event model (I couldn't find anything even remotely
>> close in the javadocs)  But I've also been known to be wrong many many many
>> times :)
>>
>
> Ah, not quite. I'd suggest you look at the code that drives
> WorkbookFactory:
> http://svn.apache.org/repos/**asf/poi/trunk/src/ooxml/java/**
> org/apache/poi/ss/usermodel/**WorkbookFactory.java<http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/ss/usermodel/WorkbookFactory.java>
>
> There's actually not a lot to it, and you can hopefully see how to use the
> helper methods on the underlying poifs/opcpackage classes to do the
> detection and call your custom classes that way
>
>
> Nick
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.**org<us...@poi.apache.org>
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: Sniffing File Type from InpuStream?

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 15 Jul 2013, Eric Peters wrote:
> Should have clarified maybe, I'm using the event model for both classes 
> since I'm processing some fairly large files, is there a WorkbookFactory 
> equivalent for the event model (I couldn't find anything even remotely 
> close in the javadocs)  But I've also been known to be wrong many many 
> many times :)

Ah, not quite. I'd suggest you look at the code that drives 
WorkbookFactory:
http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/ss/usermodel/WorkbookFactory.java

There's actually not a lot to it, and you can hopefully see how to use the 
helper methods on the underlying poifs/opcpackage classes to do the 
detection and call your custom classes that way

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Sniffing File Type from InpuStream?

Posted by Eric Peters <Er...@Peters.org>.
Should have clarified maybe, I'm using the event model for both classes
since I'm processing some fairly large files, is there a WorkbookFactory
equivalent for the event model (I couldn't find anything even remotely
close in the javadocs)  But I've also been known to be wrong many many many
times :)


On Mon, Jul 15, 2013 at 11:13 AM, Nick Burch <ap...@gagravarr.org> wrote:

> On Mon, 15 Jul 2013, Eric Peters wrote:
>
>> If you have an InputStream, what's a good best practice for determining
>> whether it's a HSSF vs XSSF file?  I'm assuming by somehow wrapping the
>> InputStream in a BufferedInputStream and marking/resetting before actually
>> doing the processing.
>>
>
> WorkbookFactory exists for this very reason!
>
> Nick
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.**org<us...@poi.apache.org>
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: Sniffing File Type from InpuStream?

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 15 Jul 2013, Eric Peters wrote:
> If you have an InputStream, what's a good best practice for determining 
> whether it's a HSSF vs XSSF file?  I'm assuming by somehow wrapping the 
> InputStream in a BufferedInputStream and marking/resetting before 
> actually doing the processing.

WorkbookFactory exists for this very reason!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org