You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by James Geroge <ja...@gmail.com> on 2010/04/19 13:49:25 UTC

How to check for valid excel files using POI without checking the file extension

Hi Friends,
Is there a way to know the file is an excel file without manipulating the
file extension, as the users can send the excel files in format like below.
Test
Test.xls
Test.xlsx
Test.xlsxxlsx(by renaming the file using windows explorer)
Test.xlsabcd (by renaming

Thanks,
James George.
-- 
View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287650.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: How to check for valid excel files using POI without checking the file extension

Posted by Paul Spencer <pa...@apache.org>.
James,
You can open the file using SS Usermode. See http://poi.markmail.org/message/ejihftiztkrifcvq?q=from:%22Paul+Spencer%22

Paul Spencer


On Apr 19, 2010, at 7:49 AM, James Geroge wrote:

> 
> Hi Friends,
> Is there a way to know the file is an excel file without manipulating the
> file extension, as the users can send the excel files in format like below.
> Test
> Test.xls
> Test.xlsx
> Test.xlsxxlsx(by renaming the file using windows explorer)
> Test.xlsabcd (by renaming
> 
> Thanks,
> James George.
> -- 
> View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287650.html
> Sent from the POI - Dev mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: How to check for valid excel files using POI without checking the file extension

Posted by Antoni Mylka <an...@gmail.com>.
W dniu 2010-04-19 13:49, James Geroge pisze:
>
> Hi Friends,
> Is there a way to know the file is an excel file without manipulating the
> file extension, as the users can send the excel files in format like below.
> Test
> Test.xls
> Test.xlsx
> Test.xlsxxlsx(by renaming the file using windows explorer)
> Test.xlsabcd (by renaming

You can use a mime type identifier. Nice ones are in the Aperture 
Framework (disclaimer: I'm Aperture's maintainer :) ) or in Apache Tika.

AFAIK Apache Tika's is better at recognizing xlsx files if the extension 
is missing, at least at the moment. We're working on it too.

Antoni Mylka
antoni.mylka@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: How to check for valid excel files using POI without checking the file extension

Posted by James Geroge <ja...@gmail.com>.
Hello Mark,
I appreciate you help.

Regards,
JG

MSB wrote:
> 
> You're welcome James. All the best with your project and, if you need any
> further help, just drop an message onto the list.
> 
> Yours
> 
> Mark B
> 
> 
> James Geroge wrote:
>> 
>> Hello Mark,
>> Thanks for your suggestions.
>> I tried to raise a null pointer exception with WorkbookFactory Class, but
>> did not work so done a try catch and able to get a handler to the
>> requirement i had.
>> 
>> and Thanks for the other suggestions too.
>> 
>> The code below...
>> try
>> 	    {
>> 		if (WorkbookFactory.create(input)!=null)
>> 		    {
>> 			log("GOOD FILE");
>> 		    }
>> 		    else
>> 		    {
>> 			log("Invalid input file Or Not a valid Excel file");
>> 		    }
>> 	    }
>> 	    catch (Exception e1) {
>> 		//e1.printStackTrace();
>> 		log("Invalid input file Or Not a valid Excel file");
>> 		return; // no need to process if it is not an excel
>> 		}
>> 
>> Thanks,
>> James George
>> 
>> 
>> MSB wrote:
>>> 
>>> Hello James,
>>> 
>>> The most obvious answer is the WorkbookFactory class -
>>> http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html
>>> - if you have a valid Excel workbook then it will return an instance of
>>> either the XSSF or HSSFWorkbook class. That does impose some overhead of
>>> course as the Excel file will effectively be opened which could take a
>>> few moments and tie up some memory.
>>> 
>>> The other option would be to look at the file header, the first few
>>> bytes of the file. There is a website - filext.com - that includes
>>> provides this sort of information. For example, here is the information
>>> for the .xls file format http://filext.com/file-extension/XLS and this
>>> for the .xlsx http://filext.com/file-extension/xlsx. In essence, you
>>> would open a stream onto the file, recover the first few bytes and see
>>> if they match either pattern; but I do not know whether this is an
>>> entirely fail safe option.
>>> 
>>> Yours
>>> 
>>> Mark B
>>> 
>>> PS. You have posted this onto the the dev list when if really ought to
>>> be posted onto the user list. The dev list is where you would post if
>>> you were experiencing problems with the API - for example a particular
>>> file provoking exceptions - or if you wanted to ask for an enhancement.
>>> Furthermore, fewer people view the dev list and you are reducing your
>>> chances of receiving a response to your question.
>>> 
>>> 
>>> 
>>> James Geroge wrote:
>>>> 
>>>> Hi Friends,
>>>> Is there a way to know the file is an excel file without manipulating
>>>> the file extension, as the users can send the excel files in format
>>>> like below.
>>>> Test
>>>> Test.xls
>>>> Test.xlsx
>>>> Test.xlsxxlsx(by renaming the file using windows explorer)
>>>> Test.xlsabcd (by renaming
>>>> 
>>>> Thanks,
>>>> James George.
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287705.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: How to check for valid excel files using POI without checking the file extension

Posted by MSB <ma...@tiscali.co.uk>.
You're welcome James. All the best with your project and, if you need any
further help, just drop an message onto the list.

Yours

Mark B


James Geroge wrote:
> 
> Hello Mark,
> Thanks for your suggestions.
> I tried to raise a null pointer exception with WorkbookFactory Class, but
> did not work so done a try catch and able to get a handler to the
> requirement i had.
> 
> and Thanks for the other suggestions too.
> 
> The code below...
> try
> 	    {
> 		if (WorkbookFactory.create(input)!=null)
> 		    {
> 			log("GOOD FILE");
> 		    }
> 		    else
> 		    {
> 			log("Invalid input file Or Not a valid Excel file");
> 		    }
> 	    }
> 	    catch (Exception e1) {
> 		//e1.printStackTrace();
> 		log("Invalid input file Or Not a valid Excel file");
> 		return; // no need to process if it is not an excel
> 		}
> 
> Thanks,
> James George
> 
> 
> MSB wrote:
>> 
>> Hello James,
>> 
>> The most obvious answer is the WorkbookFactory class -
>> http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html
>> - if you have a valid Excel workbook then it will return an instance of
>> either the XSSF or HSSFWorkbook class. That does impose some overhead of
>> course as the Excel file will effectively be opened which could take a
>> few moments and tie up some memory.
>> 
>> The other option would be to look at the file header, the first few bytes
>> of the file. There is a website - filext.com - that includes provides
>> this sort of information. For example, here is the information for the
>> .xls file format http://filext.com/file-extension/XLS and this for the
>> .xlsx http://filext.com/file-extension/xlsx. In essence, you would open a
>> stream onto the file, recover the first few bytes and see if they match
>> either pattern; but I do not know whether this is an entirely fail safe
>> option.
>> 
>> Yours
>> 
>> Mark B
>> 
>> PS. You have posted this onto the the dev list when if really ought to be
>> posted onto the user list. The dev list is where you would post if you
>> were experiencing problems with the API - for example a particular file
>> provoking exceptions - or if you wanted to ask for an enhancement.
>> Furthermore, fewer people view the dev list and you are reducing your
>> chances of receiving a response to your question.
>> 
>> 
>> 
>> James Geroge wrote:
>>> 
>>> Hi Friends,
>>> Is there a way to know the file is an excel file without manipulating
>>> the file extension, as the users can send the excel files in format like
>>> below.
>>> Test
>>> Test.xls
>>> Test.xlsx
>>> Test.xlsxxlsx(by renaming the file using windows explorer)
>>> Test.xlsabcd (by renaming
>>> 
>>> Thanks,
>>> James George.
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287704.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: How to check for valid excel files using POI without checking the file extension

Posted by James Geroge <ja...@gmail.com>.
Hello Mark,
Thanks for your suggestions.
I tried to raise a null pointer exception with WorkbookFactory Class, but
did not work so done a try catch and able to get a handler to the
requirement i had.

and Thanks for the other suggestions too.

The code below...
try
	    {
		if (WorkbookFactory.create(input)!=null)
		    {
			log("GOOD FILE");
		    }
		    else
		    {
			log("Invalid input file Or Not a valid Excel file");
		    }
	    }
	    catch (Exception e1) {
		//e1.printStackTrace();
		log("Invalid input file Or Not a valid Excel file");
		return; // no need to process if it is not an excel
		}

Thanks,
James George


MSB wrote:
> 
> Hello James,
> 
> The most obvious answer is the WorkbookFactory class -
> http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html
> - if you have a valid Excel workbook then it will return an instance of
> either the XSSF or HSSFWorkbook class. That does impose some overhead of
> course as the Excel file will effectively be opened which could take a few
> moments and tie up some memory.
> 
> The other option would be to look at the file header, the first few bytes
> of the file. There is a website - filext.com - that includes provides this
> sort of information. For example, here is the information for the .xls
> file format http://filext.com/file-extension/XLS and this for the .xlsx
> http://filext.com/file-extension/xlsx. In essence, you would open a stream
> onto the file, recover the first few bytes and see if they match either
> pattern; but I do not know whether this is an entirely fail safe option.
> 
> Yours
> 
> Mark B
> 
> PS. You have posted this onto the the dev list when if really ought to be
> posted onto the user list. The dev list is where you would post if you
> were experiencing problems with the API - for example a particular file
> provoking exceptions - or if you wanted to ask for an enhancement.
> Furthermore, fewer people view the dev list and you are reducing your
> chances of receiving a response to your question.
> 
> 
> 
> James Geroge wrote:
>> 
>> Hi Friends,
>> Is there a way to know the file is an excel file without manipulating the
>> file extension, as the users can send the excel files in format like
>> below.
>> Test
>> Test.xls
>> Test.xlsx
>> Test.xlsxxlsx(by renaming the file using windows explorer)
>> Test.xlsabcd (by renaming
>> 
>> Thanks,
>> James George.
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287703.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: How to check for valid excel files using POI without checking the file extension

Posted by MSB <ma...@tiscali.co.uk>.
Hello James,

The most obvious answer is the WorkbookFactory class -
http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html
- if you have a valid Excel workbook then it will return an instance of
either the XSSF or HSSFWorkbook class. That does impose some overhead of
course as the Excel file will effectively be opened which could take a few
moments and tie up some memory.

The other option would be to look at the file header, the first few bytes of
the file. There is a website - filext.com - that includes provides this sort
of information. For example, here is the information for the .xls file
format http://filext.com/file-extension/XLS and this for the .xlsx
http://filext.com/file-extension/xlsx. In essence, you would open a stream
onto the file, recover the first few bytes and see if they match either
pattern; but I do not know whether this is an entirely fail safe option.

Yours

Mark B

PS. You have posted this onto the the dev list when if really ought to be
posted onto the user list. The dev list is where you would post if you were
experiencing problems with the API - for example a particular file provoking
exceptions - or if you wanted to ask for an enhancement. Furthermore, fewer
people view the dev list and you are reducing your chances of receiving a
response to your question.



James Geroge wrote:
> 
> Hi Friends,
> Is there a way to know the file is an excel file without manipulating the
> file extension, as the users can send the excel files in format like
> below.
> Test
> Test.xls
> Test.xlsx
> Test.xlsxxlsx(by renaming the file using windows explorer)
> Test.xlsabcd (by renaming
> 
> Thanks,
> James George.
> 

-- 
View this message in context: http://old.nabble.com/How-to-check-for-valid-excel-files-using-POI-without-checking-the-file-extension-tp28287650p28287694.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org