You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2014/04/09 03:18:34 UTC

Tika VM Service

Hi FOlks,
I would like to propose that we get a Tika service up and running on a VM.
Tika users can do adhoc parsing, etc and can do this based on possibly
stable nightly SNAPSHOT's or alternatively based on the most recent stable
release.
Preferably, the service should provide a list of parsers and also
MediaType's supported.
The service however should be documented.
We have a sample service running Any23 an which will provide you with an
example of what the Tika service will be like.
http://any23.org
Does anyone have an objection to me logging a ticket with Infra to get a VM
set up for this purpose?
Thanks
Lewis

-- 
*Lewis*

Re: Tika VM Service

Posted by David Meikle <lo...@gmail.com>.
+1 from me too.

I was actually starting to do a similar thing here in OpenShift:
https://github.com/Categorize/openshift-tika-cartridge

This started as quick lighting talk at the end of an OpenShift session at my local JBoss Users Group but was planning to extend this to take a nightly build following a little tweak and then keep it hosted online.

Cheers,
Dave

On 9 Apr 2014, at 02:18, Lewis John Mcgibbney <le...@gmail.com> wrote:

> Hi FOlks,
> I would like to propose that we get a Tika service up and running on a VM.
> Tika users can do adhoc parsing, etc and can do this based on possibly
> stable nightly SNAPSHOT's or alternatively based on the most recent stable
> release.
> Preferably, the service should provide a list of parsers and also
> MediaType's supported.
> The service however should be documented.
> We have a sample service running Any23 an which will provide you with an
> example of what the Tika service will be like.
> http://any23.org
> Does anyone have an objection to me logging a ticket with Infra to get a VM
> set up for this purpose?
> Thanks
> Lewis
> 
> -- 
> *Lewis*


Re: Tika VM Service

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
+1

Sent from my iPhone

> On Apr 8, 2014, at 6:19 PM, "Lewis John Mcgibbney" <le...@gmail.com> wrote:
> 
> Hi FOlks,
> I would like to propose that we get a Tika service up and running on a VM.
> Tika users can do adhoc parsing, etc and can do this based on possibly
> stable nightly SNAPSHOT's or alternatively based on the most recent stable
> release.
> Preferably, the service should provide a list of parsers and also
> MediaType's supported.
> The service however should be documented.
> We have a sample service running Any23 an which will provide you with an
> example of what the Tika service will be like.
> http://any23.org
> Does anyone have an objection to me logging a ticket with Infra to get a VM
> set up for this purpose?
> Thanks
> Lewis
> 
> -- 
> *Lewis*

Re: Tika VM Service

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 9 Apr 2014, Konstantin Gribov wrote:
> I can recommend packer.io to generate images for major virtualization 
> systems. Virtual appliances is useful for learning some software.

With Tika, you can download either the Tika App single jar or the Tika 
Server single jar, and both let you get up and running with Tika very 
quickly. How would a virtual appliance work, and how might it be better?

Nick

Re: Tika VM Service

Posted by Konstantin Gribov <gr...@gmail.com>.
I can recommend packer.io to generate images for major virtualization
systems. Virtual appliances is useful for learning some software.

-- 
Best regards,
Konstantin Gribov.
09.04.2014 18:04 пользователь "Nick Burch" <ap...@gagravarr.org> написал:

> On Wed, 9 Apr 2014, Nick Burch wrote:
>
>> My vision of how this would work would be to use the Tika Server, with
>> some extensions so that it self hosted some basic documentation. We're
>> thinking of trying to start that tomorrow in the hackathon, any help /
>> ideas / projects to crib off gratefully received!
>>
>
> Turns out that there are two CXF talks today here at ApacheCon:
> http://apacheconnorthamerica2014.sched.org/event/
> 263569b2db434a06020d9405e9dce03b#.U0VSsqaxuSQ
> http://apacheconnorthamerica2014.sched.org/event/
> b419b6eb39da92d10053b8a067f70c71#.U0VSvaaxuSQ
>
> At least one of them mentions documentation, so we may need to send
> someone out of the hackathon room to go and learn!
>
> Nick
>

Re: Tika VM Service

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 9 Apr 2014, Nick Burch wrote:
> My vision of how this would work would be to use the Tika Server, with 
> some extensions so that it self hosted some basic documentation. We're 
> thinking of trying to start that tomorrow in the hackathon, any help / 
> ideas / projects to crib off gratefully received!

Turns out that there are two CXF talks today here at ApacheCon:
http://apacheconnorthamerica2014.sched.org/event/263569b2db434a06020d9405e9dce03b#.U0VSsqaxuSQ
http://apacheconnorthamerica2014.sched.org/event/b419b6eb39da92d10053b8a067f70c71#.U0VSvaaxuSQ

At least one of them mentions documentation, so we may need to send 
someone out of the hackathon room to go and learn!

Nick

RE: Tika VM Service

Posted by Hong-Thai Nguyen <Ho...@polyspot.com>.
Hi Tika members,

Thank for this great initiative. I guess that there's some use cases possible when creating such service:
1. Tika exploitation
We may create a free accessible Tika Server to parse documents coming from public requests, a kind of demo or free-try document parser to check Tika feasibility on special user documents. That will make sense because a native user don't have to download, install latest build from snapshot version. We should add some check on incoming requests to refuse abusing/spam requests. This case provides similar service as in any23.org site.

2. Tika parser development
"Tika users can do adhoc parsing" is a great idea. I think we would have an "online IDE" for Tika parsers development. For this case, we may can have 2 sub scenarios:
2.1: Using existing parsers and adding new features (as adding missing parsed metadata, fixing bugs on XHTML handler)... This case don't need adding new library, and user can extends the interested Parser and try with testing documents. Using Groovy is an idea, because it's simple and Java-like language.
2.2: Creating new parser: but, from parser development experience, creating new parser ask usually 3rd party libraries, to build/run with this online service, we need to extend dynamically classloader. If we really want to support this use case, we can eventually wrap client's jars & classes as OSGi plugin, then loading/executing on server side. I don't know this scenario make a great sense when users have always possibility to checkout/build/develop new parser locally.

3. Tika parsers libraries store
For some reason (incapability of libraries, license's constrains ...), Official Tika could not integrate contributed parsers, this kind of service stores these parsers and anyone can download, apply within user's context.

Anyway, this service requires resources and humain effort in creating and maintenance.

Hong-Thai

-----Message d'origine-----
De : Nick Burch [mailto:apache@gagravarr.org] 
Envoyé : mercredi 9 avril 2014 06:32
À : dev@tika.apache.org
Objet : Re: Tika VM Service

On Tue, 8 Apr 2014, Lewis John Mcgibbney wrote:
> I would like to propose that we get a Tika service up and running on a VM.
> Tika users can do adhoc parsing, etc and can do this based on possibly 
> stable nightly SNAPSHOT's or alternatively based on the most recent 
> stable release.
> Preferably, the service should provide a list of parsers and also 
> MediaType's supported.

My vision of how this would work would be to use the Tika Server, with some extensions so that it self hosted some basic documentation. We're thinking of trying to start that tomorrow in the hackathon, any help / ideas / projects to crib off gratefully received!

Nick

Re: Tika VM Service

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Awesomeness Nick. Be nice to Annie and have fun (/me jealous)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-5th floor
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Nick Burch <ap...@gagravarr.org>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Tuesday, April 8, 2014 9:32 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: Tika VM Service

>On Tue, 8 Apr 2014, Lewis John Mcgibbney wrote:
>> I would like to propose that we get a Tika service up and running on a
>>VM.
>> Tika users can do adhoc parsing, etc and can do this based on possibly
>> stable nightly SNAPSHOT's or alternatively based on the most recent
>>stable
>> release.
>> Preferably, the service should provide a list of parsers and also
>> MediaType's supported.
>
>My vision of how this would work would be to use the Tika Server, with
>some extensions so that it self hosted some basic documentation. We're
>thinking of trying to start that tomorrow in the hackathon, any help /
>ideas / projects to crib off gratefully received!
>
>Nick


Re: Tika VM Service

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 8 Apr 2014, Lewis John Mcgibbney wrote:
> I would like to propose that we get a Tika service up and running on a VM.
> Tika users can do adhoc parsing, etc and can do this based on possibly
> stable nightly SNAPSHOT's or alternatively based on the most recent stable
> release.
> Preferably, the service should provide a list of parsers and also
> MediaType's supported.

My vision of how this would work would be to use the Tika Server, with 
some extensions so that it self hosted some basic documentation. We're 
thinking of trying to start that tomorrow in the hackathon, any help / 
ideas / projects to crib off gratefully received!

Nick