You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Cope, Christopher" <Ch...@logicacmg.com> on 2003/11/12 17:50:48 UTC

Using HSSF to parse Excel

I am working on a system that automatically extracts data from .xls files,
performs manipulation of the data and then inserts the manipulated data into
an Oracle database.
There are numerous sets of data that we need to extract from different .xls
files, and the Excel spreadsheets themselves come in a number of different
formats - single worksheets, multiple worksheets, some containing macros,
formulae etc. The data items that we need to extract can therefore be in
various different places within a spreadsheet.
The data extraction process is written in Java and to handle the complexity
of where to find each data item we are using a Java rules engine.

Currently we do not access the .xls files themselves with Java, instead we
use the Runtime object to kick off an external VB process. The VB process
uses the Excel 2002 XML support to save the .xls files into Microsoft's XML
format. The Java then resumes using JAXP to read the XML files.

We have encountered various problems with VB processes failing to terminate
and are also keen to streamline things by keeping it all as one Java
process. We thus want to refactor the .xls file reading process to use Java.

So my question(s):

Is HSSF's event model the best API to achieve this?
If so will the fact that the spreadsheets typically have lots of formatting
cause problems? (see http://jakarta.apache.org/poi/faq.html Q.14)

If not what else could be used? Would it be possible to access the .xls
files using Star Office's Universal Network Objects?

Any comments gratefully received.

Thanks

Chris

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


AW: Using HSSF to parse Excel

Posted by Karl-Heinz Zengerle <ka...@sawag.com>.
Hallo Dietmar.

Du siehst, was andere von VB halten (hier im Vergleich zu einer
JAVA-Bibliothek für Office). Die angesprochenen Probleme erinnern mich an
deine Probleme bzgl. der Office-Integration.

Gruß,	Karl-Heinz.


-----Ursprüngliche Nachricht-----
Von: Avik Sengupta [mailto:Avik.Sengupta@itellix.com]
Gesendet: Mittwoch, 12. November 2003 17:53
An: POI Users List
Betreff: Re: Using HSSF to parse Excel


You will undoubtedly achieve tremendous speed and stability improvements
moving from VBA to POI.

Whether to use eventmodel or not depends on how low level you want to
go.. it'll provide memory benefits, but usermodel is order or magnitude
easier to code.

The important question is if POI can handle all the features in xl that
you need. That is unfortunately impossible for us to answer. For
example, rich text formats... while POI has much improved support for
rich text since that FAQ was written, its logically impossible to verify
that is supports every bit of rich text in every xl file out there..
such are the travails of working without written specs.

So what we usually suggest  is "if it passes your tests, its good enuf
for you". So if your requirement is to be able to flawlessly process
every possible xl sheet that you can throw at it, then POI is not for
you. If however, you can create a reasonable subset that you can test,
its perfect for you job. So get POI to open your spreadsheets one by
one, and see how it goes.

As for Open/Star office, its theortically possible to use UNO etc, but
from whatever I have heard, its not very easy to set up. It will of
course also have the drawback that again, there can be no theoretical
guarantee that it will be able to process every xl file out there. Only
excel can guarantee that it will process ALL excel files properly
(well... almost :)

HTH
-
Avik



On Wed, 2003-11-12 at 22:20, Cope, Christopher wrote:
> I am working on a system that automatically extracts data from .xls files,
> performs manipulation of the data and then inserts the manipulated data
into
> an Oracle database.
> There are numerous sets of data that we need to extract from different
.xls
> files, and the Excel spreadsheets themselves come in a number of different
> formats - single worksheets, multiple worksheets, some containing macros,
> formulae etc. The data items that we need to extract can therefore be in
> various different places within a spreadsheet.
> The data extraction process is written in Java and to handle the
complexity
> of where to find each data item we are using a Java rules engine.
>
> Currently we do not access the .xls files themselves with Java, instead we
> use the Runtime object to kick off an external VB process. The VB process
> uses the Excel 2002 XML support to save the .xls files into Microsoft's
XML
> format. The Java then resumes using JAXP to read the XML files.
>
> We have encountered various problems with VB processes failing to
terminate
> and are also keen to streamline things by keeping it all as one Java
> process. We thus want to refactor the .xls file reading process to use
Java.
>
> So my question(s):
>
> Is HSSF's event model the best API to achieve this?
> If so will the fact that the spreadsheets typically have lots of
formatting
> cause problems? (see http://jakarta.apache.org/poi/faq.html Q.14)
>
> If not what else could be used? Would it be possible to access the .xls
> files using Star Office's Universal Network Objects?
>
> Any comments gratefully received.
>
> Thanks
>
> Chris
>
> This e-mail and any attachment is for authorised use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be copied,
disclosed to, retained or used by, any other party. If you are not an
intended recipient then please promptly delete this e-mail and any
attachment and all copies and inform the sender. Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: Using HSSF to parse Excel

Posted by Avik Sengupta <Av...@itellix.com>.
You will undoubtedly achieve tremendous speed and stability improvements
moving from VBA to POI. 

Whether to use eventmodel or not depends on how low level you want to
go.. it'll provide memory benefits, but usermodel is order or magnitude
easier to code. 

The important question is if POI can handle all the features in xl that
you need. That is unfortunately impossible for us to answer. For
example, rich text formats... while POI has much improved support for
rich text since that FAQ was written, its logically impossible to verify
that is supports every bit of rich text in every xl file out there..
such are the travails of working without written specs. 

So what we usually suggest  is "if it passes your tests, its good enuf
for you". So if your requirement is to be able to flawlessly process
every possible xl sheet that you can throw at it, then POI is not for
you. If however, you can create a reasonable subset that you can test,
its perfect for you job. So get POI to open your spreadsheets one by
one, and see how it goes. 

As for Open/Star office, its theortically possible to use UNO etc, but
from whatever I have heard, its not very easy to set up. It will of
course also have the drawback that again, there can be no theoretical
guarantee that it will be able to process every xl file out there. Only
excel can guarantee that it will process ALL excel files properly
(well... almost :)

HTH
-
Avik



On Wed, 2003-11-12 at 22:20, Cope, Christopher wrote:
> I am working on a system that automatically extracts data from .xls files,
> performs manipulation of the data and then inserts the manipulated data into
> an Oracle database.
> There are numerous sets of data that we need to extract from different .xls
> files, and the Excel spreadsheets themselves come in a number of different
> formats - single worksheets, multiple worksheets, some containing macros,
> formulae etc. The data items that we need to extract can therefore be in
> various different places within a spreadsheet.
> The data extraction process is written in Java and to handle the complexity
> of where to find each data item we are using a Java rules engine.
> 
> Currently we do not access the .xls files themselves with Java, instead we
> use the Runtime object to kick off an external VB process. The VB process
> uses the Excel 2002 XML support to save the .xls files into Microsoft's XML
> format. The Java then resumes using JAXP to read the XML files.
> 
> We have encountered various problems with VB processes failing to terminate
> and are also keen to streamline things by keeping it all as one Java
> process. We thus want to refactor the .xls file reading process to use Java.
> 
> So my question(s):
> 
> Is HSSF's event model the best API to achieve this?
> If so will the fact that the spreadsheets typically have lots of formatting
> cause problems? (see http://jakarta.apache.org/poi/faq.html Q.14)
> 
> If not what else could be used? Would it be possible to access the .xls
> files using Star Office's Universal Network Objects?
> 
> Any comments gratefully received.
> 
> Thanks
> 
> Chris
> 
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org