You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Jukka Zitting <ju...@gmail.com> on 2010/07/06 15:01:42 UTC

Re: How can I help Tika to choose the right extractor for my data?

Hi,

Sorry for the late response.

On Tue, Jun 29, 2010 at 2:53 AM, zabrane Mikael <za...@gmail.com> wrote:
> I learned that's possible to help (or advice) Tika to choose the right
> extractor for a document if I have for example its MimeType.
> Am exactly in this case. For each document in my collection, I know its
> MimeType.
> How one can apply this idea guys (code snippet please)?

You'll want to pass the media type as a part of the input metadata you
pass to the parsing process, like this:

    Metadata metadata = new Metadata():
    metadata.set(Metadata.CONTENT_TYPE, knownType);

    Parser parser = new AutoDetectParser();
    parser.parse(..., metadata, ...);

> Finally, does someone know when Tika-0.8 will be released?

At current pace I expect it to be out sometime in this quarter.

BR,

Jukka Zitting

Re: How can I help Tika to choose the right extractor for my data?

Posted by zabrane Mikael <za...@gmail.com>.
Thanks Jukka !

2010/7/6 Jukka Zitting <ju...@gmail.com>

> Hi,
>
> Sorry for the late response.
>
> On Tue, Jun 29, 2010 at 2:53 AM, zabrane Mikael <za...@gmail.com>
> wrote:
> > I learned that's possible to help (or advice) Tika to choose the right
> > extractor for a document if I have for example its MimeType.
> > Am exactly in this case. For each document in my collection, I know its
> > MimeType.
> > How one can apply this idea guys (code snippet please)?
>
> You'll want to pass the media type as a part of the input metadata you
> pass to the parsing process, like this:
>
>    Metadata metadata = new Metadata():
>    metadata.set(Metadata.CONTENT_TYPE, knownType);
>
>    Parser parser = new AutoDetectParser();
>    parser.parse(..., metadata, ...);
>
> > Finally, does someone know when Tika-0.8 will be released?
>
> At current pace I expect it to be out sometime in this quarter.
>
> BR,
>
> Jukka Zitting
>



-- 
Regards
Zabrane