You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Jeff G <je...@gmail.com> on 2011/05/26 23:44:57 UTC

Reducing the size of POI

I'm using POI strictly for *reading Excel xls & xlsx *documents.  I'm using
this as part of a Java Web Start app with somewhat low bandwidth.  POI is by
far my biggest size hog.  Is there any way I can reduce the size of this?
Are all these libraries needed?  This is what I have...

dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
xmlbeans

Is all this necessary?  Over 10MB of stuff... yikes.

- Jeff

Re: Reducing the size of POI

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 27 May 2011, Jeff G wrote:
> I'm sure there are probably technical reasons for the structure, but from
> someone that's green to the java world, less jars would make sense to me.

You always need the main POI jar. If you just want excel .xls, stop there. 
If you want the other binary file formats, add scratchpad.

If you want the ooxml formats, you add the poi-ooxml jar, a schemas jar, 
and all the xml dependencies.

> What is more common - developer only wanting pre-2003 office support or 
> current support but for a particular application?

No idea, sorry. I think people tend to either want to write one format, or 
read from all of them.

> The current structure seems to break it up into core, xml core, and xml
> schemas.  Is the xml core used without the xml schema?

No, but you have a choice of two schemas jars. You can either use the full 
one, or the smaller "common parts" poi-ooxml-schemas one. That's one of 
the main reasons for keeping it seperate.

> If I were to only need pre-2003 support, it would probably be simpler to 
> remove the folder for xml classes than what we'd have to do now to try 
> and break up the applications.

If you want to only do binary formats, you need the main poi jar, and 
scratchpad for the non excel formats. You don't need any of the xml jars 
(POI or dependencies) if you want to only do the older formats.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reducing the size of POI

Posted by Jeff G <je...@gmail.com>.
I'm sure there are probably technical reasons for the structure, but from
someone that's green to the java world, less jars would make sense to me.
But have a few options based on application, not file type.  poi-common.jar,
poi-excel.jar, poi-word.jar, poi-powerpoint.jar.  If you want all of office
you have all four files, if you just need Excel, you have two.

What is more common - developer only wanting pre-2003 office support or
current support but for a particular application?

The current structure seems to break it up into core, xml core, and xml
schemas.  Is the xml core used without the xml schema?  If I were to only
need pre-2003 support, it would probably be simpler to remove the folder for
xml classes than what we'd have to do now to try and break up the
applications.

- Jeff

On Fri, May 27, 2011 at 7:12 AM, Nick Burch <ni...@alfresco.com> wrote:

> On Thu, 26 May 2011, Mark Fortner wrote:
>
>> This kinda begs the question "is POI modular enough". I've seen a number
>> of
>> questions arising from people not having the right set of dependent
>> libraries. But having a lighter weight set of libraries would also be
>> useful. Perhaps as the original poster suggested, having a separate
>> library
>> for each type of document would make things easier.
>>
>
> Given the ratio of questions to the list for "I'm missing a bit of POI
> because I've forgotten a jar" to "I don't want all of POI", I think the push
> would possibly be towards a single monolithic jar!
>
> There's quite a bit of code that's common between all the components, so
> we'd end up with something like:
> * poi-core
> * poi-hssf
> * poi-hslf
> * poi-hwpf
> * poi-all-other-scratchpad
> * poi-ooxml-core
> * poi-ooxml-xssf
> * poi-ooxml-xwpf
> * poi-ooxml-xslf
> * poi-ooxml-schemas-core
> * poi-ooxml-schemas-xssf
> * poi-ooxml-schemas-xwpf
> * poi-ooxml-schemas-xslf
> and possibly something else... The risk of people missing something or
> getting one from the wrong version seems much to high to me!
>
> Also, people interested in getting a cut down version of POI are likely to
> all have different requirements. If you want only excel, but also low
> memory, then you can exclude much of the hssf usermodel and keep just the
> low level parts. It all depends. I think it's probably better for people
> with specific requirements to slice and dice it how they need.
>
>
>  Since I don't tend to build POI I was wondering if it would be difficult
>> to modify the build to produce separate jars and to perhaps zip up the
>> dependencies that people keep neglecting to download?
>>
>
> If you download the binary release, then it has all the dependencies in it,
> along with the POI jars and the documentation. If you use maven, it handles
> fetching the dependencies for you. They're all already there...
>
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: Reducing the size of POI

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 26 May 2011, Mark Fortner wrote:
> This kinda begs the question "is POI modular enough". I've seen a number of
> questions arising from people not having the right set of dependent
> libraries. But having a lighter weight set of libraries would also be
> useful. Perhaps as the original poster suggested, having a separate library
> for each type of document would make things easier.

Given the ratio of questions to the list for "I'm missing a bit of POI 
because I've forgotten a jar" to "I don't want all of POI", I think the 
push would possibly be towards a single monolithic jar!

There's quite a bit of code that's common between all the components, so 
we'd end up with something like:
* poi-core
* poi-hssf
* poi-hslf
* poi-hwpf
* poi-all-other-scratchpad
* poi-ooxml-core
* poi-ooxml-xssf
* poi-ooxml-xwpf
* poi-ooxml-xslf
* poi-ooxml-schemas-core
* poi-ooxml-schemas-xssf
* poi-ooxml-schemas-xwpf
* poi-ooxml-schemas-xslf
and possibly something else... The risk of people missing something or 
getting one from the wrong version seems much to high to me!

Also, people interested in getting a cut down version of POI are likely to 
all have different requirements. If you want only excel, but also low 
memory, then you can exclude much of the hssf usermodel and keep just the 
low level parts. It all depends. I think it's probably better for people 
with specific requirements to slice and dice it how they need.

> Since I don't tend to build POI I was wondering if it would be difficult 
> to modify the build to produce separate jars and to perhaps zip up the 
> dependencies that people keep neglecting to download?

If you download the binary release, then it has all the dependencies in 
it, along with the POI jars and the documentation. If you use maven, it 
handles fetching the dependencies for you. They're all already there...

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reducing the size of POI

Posted by Mark Fortner <ph...@gmail.com>.
This kinda begs the question "is POI modular enough". I've seen a number of
questions arising from people not having the right set of dependent
libraries. But having a lighter weight set of libraries would also be
useful. Perhaps as the original poster suggested, having a separate library
for each type of document would make things easier.

Since I don't tend to build POI I was wondering if it would be difficult to
modify the build to produce separate jars and to perhaps zip up the
dependencies that people keep neglecting to download?

Mark

On May 26, 2011 7:24 PM, "Dave Fisher" <da...@comcast.net> wrote:

The poi-ooxml-schemas jar is built from the unit test coverage, you reduce
that by giving up unit tests. You can delete them from the directory tree.

You'll need a source distro and then you'll need to delete the parts of the
directory tree you don't need. It should be clear what is what, you'll focus
on keeping XSSF, HSSF, SS, POIFS, OOXML bases classes...

You'll then need to do your own build with ant.

http://poi.apache.org/howtobuild.html

Regards,
Dave


On May 26, 2011, at 4:58 PM, Jeff G wrote:

> Nick, Great tips - thanks for insight. The xml files...

Re: Reducing the size of POI

Posted by Dave Fisher <da...@comcast.net>.
The poi-ooxml-schemas jar is built from the unit test coverage, you reduce that by giving up unit tests. You can delete them from the directory tree.

You'll need a source distro and then you'll need to delete the parts of the directory tree you don't need. It should be clear what is what, you'll focus on keeping XSSF, HSSF, SS, POIFS, OOXML bases classes...

You'll then need to do your own build with ant.

http://poi.apache.org/howtobuild.html

Regards,
Dave

On May 26, 2011, at 4:58 PM, Jeff G wrote:

> Nick, Great tips - thanks for insight.  The xml files are the largest, so
> I'm very interested in how to trim them.  I opened them up, but I can't tell
> by looking what folders are for word, powerpoint, xwpf, and xslf.
> 
> - Jeff
> 
> On Thu, May 26, 2011 at 6:34 PM, Nick Burch <ni...@alfresco.com> wrote:
> 
>> On Thu, 26 May 2011, Jeff G wrote:
>> 
>>> I'm using POI strictly for *reading Excel xls & xlsx *documents.  I'm
>>> using this as part of a Java Web Start app with somewhat low bandwidth. POI
>>> is by far my biggest size hog.  Is there any way I can reduce the size of
>>> this? Are all these libraries needed?  This is what I have...
>>> 
>>> dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
>>> xmlbeans
>>> 
>> 
>> If you're just doing excel files, you can ditch poi-scratchpad and
>> poi-contrib. If you're happy to just work with .xls (not .xlsx), then you
>> can cut it back to only the main poi jar. If you need to work with .xlsx
>> files, then you need the xml related jars, the poi-ooxml jar, and the cut
>> down schemas (poi-ooxml-schemas). You might be able to shrink the
>> ooxml-schemas file by excluding the word and powerpoint related bits, ditto
>> cutting out the xwpf and xslf parts of poi-ooxml, not sure how much that'd
>> save.
>> 
>> Nick
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reducing the size of POI

Posted by Jochen Wiedmann <jo...@gmail.com>.
> I'm using POI strictly for *reading Excel xls & xlsx *documents.  I'm
> using this as part of a Java Web Start app with somewhat low bandwidth. POI
> is by far my biggest size hog.  Is there any way I can reduce the size of
> this? Are all these libraries needed?  This is what I have...

A rather simple and, to me, very recommendable solution would be to
create a servlet that gets called by the applet and creates the excel
file. That way, you'd need absolutely no additional jar files in the
applet.

Jochen

-- 
I Am What I Am And That's All What I Yam (Popeye)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reducing the size of POI

Posted by Jeff G <je...@gmail.com>.
Nick, Great tips - thanks for insight.  The xml files are the largest, so
I'm very interested in how to trim them.  I opened them up, but I can't tell
by looking what folders are for word, powerpoint, xwpf, and xslf.

- Jeff

On Thu, May 26, 2011 at 6:34 PM, Nick Burch <ni...@alfresco.com> wrote:

> On Thu, 26 May 2011, Jeff G wrote:
>
>> I'm using POI strictly for *reading Excel xls & xlsx *documents.  I'm
>> using this as part of a Java Web Start app with somewhat low bandwidth. POI
>> is by far my biggest size hog.  Is there any way I can reduce the size of
>> this? Are all these libraries needed?  This is what I have...
>>
>> dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
>> xmlbeans
>>
>
> If you're just doing excel files, you can ditch poi-scratchpad and
> poi-contrib. If you're happy to just work with .xls (not .xlsx), then you
> can cut it back to only the main poi jar. If you need to work with .xlsx
> files, then you need the xml related jars, the poi-ooxml jar, and the cut
> down schemas (poi-ooxml-schemas). You might be able to shrink the
> ooxml-schemas file by excluding the word and powerpoint related bits, ditto
> cutting out the xwpf and xslf parts of poi-ooxml, not sure how much that'd
> save.
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: Reducing the size of POI

Posted by Jeff G <je...@gmail.com>.
Has anyone tried ProGuard on POI?
http://proguard.sourceforge.net/

- Jeff

On Fri, Jun 3, 2011 at 5:40 PM, Nick Burch <ni...@alfresco.com> wrote:

> On Fri, 3 Jun 2011, Jeff G wrote:
>
>> So what about dom4j & xmlbeans?  Are these required for xlsx?
>>
>
> Yup, for the xml formats (xlsx, docx and pptx) you need:
> * poi
> * poi-scratchpad if working with .docx and .pptx
> * poi-ooxml
> * one of poi-ooxml-schemas or ooxml-schemas
> * xmlbeans + it's dependencies (eg dom4j + stax)
>
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: Reducing the size of POI

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 3 Jun 2011, Jeff G wrote:
> So what about dom4j & xmlbeans?  Are these required for xlsx?

Yup, for the xml formats (xlsx, docx and pptx) you need:
* poi
* poi-scratchpad if working with .docx and .pptx
* poi-ooxml
* one of poi-ooxml-schemas or ooxml-schemas
* xmlbeans + it's dependencies (eg dom4j + stax)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Reducing the size of POI

Posted by Jeff G <je...@gmail.com>.
So what about dom4j & xmlbeans?  Are these required for xlsx?

On Thu, May 26, 2011 at 6:34 PM, Nick Burch <ni...@alfresco.com> wrote:

> On Thu, 26 May 2011, Jeff G wrote:
>
>> I'm using POI strictly for *reading Excel xls & xlsx *documents.  I'm
>> using this as part of a Java Web Start app with somewhat low bandwidth. POI
>> is by far my biggest size hog.  Is there any way I can reduce the size of
>> this? Are all these libraries needed?  This is what I have...
>>
>> dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
>> xmlbeans
>>
>
> If you're just doing excel files, you can ditch poi-scratchpad and
> poi-contrib. If you're happy to just work with .xls (not .xlsx), then you
> can cut it back to only the main poi jar. If you need to work with .xlsx
> files, then you need the xml related jars, the poi-ooxml jar, and the cut
> down schemas (poi-ooxml-schemas). You might be able to shrink the
> ooxml-schemas file by excluding the word and powerpoint related bits, ditto
> cutting out the xwpf and xslf parts of poi-ooxml, not sure how much that'd
> save.
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: Reducing the size of POI

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 26 May 2011, Jeff G wrote:
> I'm using POI strictly for *reading Excel xls & xlsx *documents.  I'm 
> using this as part of a Java Web Start app with somewhat low bandwidth. 
> POI is by far my biggest size hog.  Is there any way I can reduce the 
> size of this? Are all these libraries needed?  This is what I have...
>
> dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
> xmlbeans

If you're just doing excel files, you can ditch poi-scratchpad and 
poi-contrib. If you're happy to just work with .xls (not .xlsx), then you 
can cut it back to only the main poi jar. If you need to work with .xlsx 
files, then you need the xml related jars, the poi-ooxml jar, and the cut 
down schemas (poi-ooxml-schemas). You might be able to shrink the 
ooxml-schemas file by excluding the word and powerpoint related bits, 
ditto cutting out the xwpf and xslf parts of poi-ooxml, not sure how much 
that'd save.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org