You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tod <li...@gmail.com> on 2011/06/22 15:00:39 UTC
Tika Jax-RS and DIH
> Mattmann, Chris A (388J <chris.a.mattmann <at> jpl.nasa.gov> writes:
>
>> >
>> > Hi Jo,
>> >
>> > You may consider checking out Tika trunk, where we recently have a Tika JAX-RS
> web service [1] committed as
>> > part of the tika-server module. You could probably wire DIH into it and
> accomplish the same thing.
>> >
>> > Cheers,
>> > Chris
>> >
>> > [1] https://issues.apache.org/jira/browse/TIKA-593
Chris - could you elaborate on using Tika Jax-RS and DIH? How
production ready is it? Could you summarize the steps necessary to get
it to work? Any examples yet?
I'd be happy to work with you to get something out to the group.
Thanks - Tod
Re: Tika Jax-RS and DIH
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Tod,
On Jun 22, 2011, at 6:00 AM, Tod wrote:
>> Mattmann, Chris A (388J <chris.a.mattmann <at> jpl.nasa.gov> writes:
>>
>>>>
>>>> Hi Jo,
>>>>
>>>> You may consider checking out Tika trunk, where we recently have a Tika JAX-RS
>> web service [1] committed as
>>>> part of the tika-server module. You could probably wire DIH into it and
>> accomplish the same thing.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> [1] https://issues.apache.org/jira/browse/TIKA-593
>
>
> Chris - could you elaborate on using Tika Jax-RS and DIH? How
> production ready is it?
Sure. I know that Maxim Valyanskiy has done a bunch of work with the Tika Jax-RS layer. It's a simple exposing of Tika met extraction and unpackaging capabilities via the JSR 311 spec. So you get REST services like:
/meta
HTTP PUTs a document to the /meta service and you get back "text/csv" of the metadata.
/tika
HTTP PUTs a document to the /tika service and you get back the extracted text.
HTTP GET prints a greeting stating the server is up.
/unpacker
HTTP PUTs an embedded document type to the /unpacker service and you get back a zip of the extracted text for each resource filename in the original PUT embedded document type.
> Could you summarize the steps necessary to get
> it to work? Any examples yet?
Basically you just build the tika-server WAR file, drop it onto a Servlet App Server (Tomcat, Jetty, etc.) and then you've got a Tika JAX-RS server.
>
> I'd be happy to work with you to get something out to the group.
Awesome! I've created a Tika Wiki page here:
http://wiki.apache.org/tika/TikaJAXRS
Since this is really also Tika related, please feel free to join user@tika.apache.org or dev@tika.apache.org by sending emails to:
user-subscribe@tika.apache.org
dev-subscribe@tik.apache.org
Then you can move the Tika portions of the conversation there. For the Solr/DIH side, this is the right list.
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++