You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Jukka Zitting <ju...@gmail.com> on 2010/07/06 15:01:42 UTC
Re: How can I help Tika to choose the right extractor for my data?
Hi,
Sorry for the late response.
On Tue, Jun 29, 2010 at 2:53 AM, zabrane Mikael <za...@gmail.com> wrote:
> I learned that's possible to help (or advice) Tika to choose the right
> extractor for a document if I have for example its MimeType.
> Am exactly in this case. For each document in my collection, I know its
> MimeType.
> How one can apply this idea guys (code snippet please)?
You'll want to pass the media type as a part of the input metadata you
pass to the parsing process, like this:
Metadata metadata = new Metadata():
metadata.set(Metadata.CONTENT_TYPE, knownType);
Parser parser = new AutoDetectParser();
parser.parse(..., metadata, ...);
> Finally, does someone know when Tika-0.8 will be released?
At current pace I expect it to be out sometime in this quarter.
BR,
Jukka Zitting
Re: How can I help Tika to choose the right extractor for my data?
Posted by zabrane Mikael <za...@gmail.com>.
Thanks Jukka !
2010/7/6 Jukka Zitting <ju...@gmail.com>
> Hi,
>
> Sorry for the late response.
>
> On Tue, Jun 29, 2010 at 2:53 AM, zabrane Mikael <za...@gmail.com>
> wrote:
> > I learned that's possible to help (or advice) Tika to choose the right
> > extractor for a document if I have for example its MimeType.
> > Am exactly in this case. For each document in my collection, I know its
> > MimeType.
> > How one can apply this idea guys (code snippet please)?
>
> You'll want to pass the media type as a part of the input metadata you
> pass to the parsing process, like this:
>
> Metadata metadata = new Metadata():
> metadata.set(Metadata.CONTENT_TYPE, knownType);
>
> Parser parser = new AutoDetectParser();
> parser.parse(..., metadata, ...);
>
> > Finally, does someone know when Tika-0.8 will be released?
>
> At current pace I expect it to be out sometime in this quarter.
>
> BR,
>
> Jukka Zitting
>
--
Regards
Zabrane