You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Jukka Zitting <ju...@gmail.com> on 2009/12/02 19:28:31 UTC

Re: a 'lite' version of ooxml-schemas jar

Hi,

On Tue, Nov 24, 2009 at 11:02 AM, Yegor Kozlov <ye...@dinom.ru> wrote:
> For Maven this change is transparent - POM for the poi-ooxml module depends
> on poi-ooxml-schemas instead of ooxml-schemas, this means Maven users will
> only need to update the version of POI from 3.5-FINAL to 3.6, the rest will
> be handled by Maven automatically.

I just had a chance to test this with Tika, and it works beautifully.
After upgrading to a POI 3.6-beta1-20091202 snapshot the size of the
tika-app jar dropped from 25MB to 15MB. That's a major improvement,
thanks! I can't wait for the next POI release.

The only odd thing about the upgrade was that I needed to comment out
a piece of Tika extraction code that uses the
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark
class as returned from XWPFParagraph.getCTP().getBookmarkStartArray().
It looks like that class is not included in the poi-ooxml-schemas jar
even though the CTP class with the getBookmarkStartArray() method is
there.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Fwd: a 'lite' version of ooxml-schemas jar

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

Some great news from POI, see below!

BR,

Jukka Zitting

---------- Forwarded message ----------
From: Jukka Zitting <ju...@gmail.com>
Date: Wed, Dec 2, 2009 at 7:28 PM
Subject: Re: a 'lite' version of ooxml-schemas jar
To: POI Developers List <de...@poi.apache.org>


Hi,

On Tue, Nov 24, 2009 at 11:02 AM, Yegor Kozlov <ye...@dinom.ru> wrote:
> For Maven this change is transparent - POM for the poi-ooxml module depends
> on poi-ooxml-schemas instead of ooxml-schemas, this means Maven users will
> only need to update the version of POI from 3.5-FINAL to 3.6, the rest will
> be handled by Maven automatically.

I just had a chance to test this with Tika, and it works beautifully.
After upgrading to a POI 3.6-beta1-20091202 snapshot the size of the
tika-app jar dropped from 25MB to 15MB. That's a major improvement,
thanks! I can't wait for the next POI release.

The only odd thing about the upgrade was that I needed to comment out
a piece of Tika extraction code that uses the
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark
class as returned from XWPFParagraph.getCTP().getBookmarkStartArray().
It looks like that class is not included in the poi-ooxml-schemas jar
even though the CTP class with the getBookmarkStartArray() method is
there.

BR,

Jukka Zitting

Re: a 'lite' version of ooxml-schemas jar

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Dec 3, 2009 at 5:58 PM, Yegor Kozlov <ye...@dinom.ru> wrote:
> the problem should be fixed in r886733.
> At least, Tika trunk compiles OK against poi-ooxml-schemas produced from POI
> trunk. JUnits run OK too.

Excellent, thanks!

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: a 'lite' version of ooxml-schemas jar

Posted by Yegor Kozlov <ye...@dinom.ru>.
the problem should be fixed in r886733.
At least, Tika trunk compiles OK against poi-ooxml-schemas produced from POI trunk. JUnits run OK too.

Regards,
Yegor

> Hi,
> 
> On Wed, Dec 2, 2009 at 7:58 PM, Yegor Kozlov <ye...@dinom.ru> wrote:
>> Can you point me at the place in Tika where getBookmarkStartArray() is used?
> 
> See line 78 of o.a.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator [1].
> 
> [1] http://svn.apache.org/viewvc/lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XWPFWordExtractorDecorator.java?revision=820962&view=markup
> 
> BR,
> 
> Jukka Zitting
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: a 'lite' version of ooxml-schemas jar

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Dec 2, 2009 at 7:58 PM, Yegor Kozlov <ye...@dinom.ru> wrote:
> Can you point me at the place in Tika where getBookmarkStartArray() is used?

See line 78 of o.a.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator [1].

[1] http://svn.apache.org/viewvc/lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XWPFWordExtractorDecorator.java?revision=820962&view=markup

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: a 'lite' version of ooxml-schemas jar

Posted by Yegor Kozlov <ye...@dinom.ru>.
it for the next POI release.
> 
> The only odd thing about the upgrade was that I needed to comment out
> a piece of Tika extraction code that uses the
> org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark
> class as returned from XWPFParagraph.getCTP().getBookmarkStartArray().
> It looks like that class is not included in the poi-ooxml-schemas jar
> even though the CTP class with the getBookmarkStartArray() method is
> there.
> 

Can you point me at the place in Tika where getBookmarkStartArray() is used?
ooxml-lite only includes classes called during execution of junits. getBookmarkStartArray is not covered by the tests 
and it explains why the CTBookmark class is missing.

Yegor

> BR,
> 
> Jukka Zitting
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org