You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Jukka Zitting <ju...@gmail.com> on 2008/02/17 13:31:01 UTC

Working with unreleased POI code

Hi,

I looked at enhancing the structured parsing abilities of the MS
Office parsers, but except for Excel I don't think it makes sense to
add much new stuff there until the relevant POI libraries are more
feature-rich. I've just contacted the POI team about getting some of
their scratchpad code released so we could leverage it in Tika.

I don't think it's a good idea to introduce unreleased dependencies to
Tika, but how about if I started a sandbox area in SVN for Parser
components based on unreleased or otherwise experimental code? That
would help us work with external projects and provide better feedback
to them already before they make new releases.

BR,

Jukka Zitting

Re: Working with unreleased POI code

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Feb 17, 2008 1:31 PM, Jukka Zitting <ju...@gmail.com> wrote:

> ...I don't think it's a good idea to introduce unreleased dependencies to
> Tika, but how about if I started a sandbox area in SVN for Parser
> components based on unreleased or otherwise experimental code?..

+1

-Bertrand

Re: Working with unreleased POI code

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Feb 17, 2008 2:31 PM, Jukka Zitting <ju...@gmail.com> wrote:
> I looked at enhancing the structured parsing abilities of the MS
> Office parsers, but except for Excel I don't think it makes sense to
> add much new stuff there until the relevant POI libraries are more
> feature-rich. I've just contacted the POI team about getting some of
> their scratchpad code released so we could leverage it in Tika.

It turned out that they're already releasing the scratchpad code as a
separate Maven artifact, so for now I've simply added that as another
normal dependency and replaced our custom Word and PowerPoint parsing
code with text extractors from POI. I'll be looking at adding more
fine-grained parsing based on existing POI features.

BR,

Jukka Zitting

Re: Working with unreleased POI code

Posted by Sami Siren <ss...@gmail.com>.
2008/2/17, Jukka Zitting <ju...@gmail.com>:

> I don't think it's a good idea to introduce unreleased dependencies to
> Tika,

+1

> but how about if I started a sandbox area in SVN for Parser
> components based on unreleased or otherwise experimental code?

+1

--
 Sami Siren