You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2014/05/20 16:59:04 UTC

[IMPORTANT] INFRA-7751 - Create a VM for Apache Tika

Hi dev@,

JanI from Infra has nearly provisioned us with a brand spanking new Ubuntu
VM which we can use for the Tika service !!!! YAY

Some things he requires first though...


* - what is the external name used by users.   tika-vm.a.o is solely for
ssh, not for public *

This is entirely up to you guys. Over in Any23 we were lucky enough to have
someone on the project team own any23.org... what about
service.tika.apache.org? Or something else maybe?



*- do you intent to use https ?   if yes the traffic will go a proxy server
*
I don't think that this is required but you guys may think differently. In
the set up of the Any23 VM, I provisioned mod_proxy_ajp module for proxy
between Tomcat where the Any23 Web Application was running and incoming
traffic via Apache HTTPD. TIka Server is a standalone server though
right... it's not packaged as a war.


*- does your software use login   Then you must use https: *

AFAIK the Tika service does not have a security layer. Can someone confirm.

Which user should have access ?

It's up to you whether you want to put your name(s) here (PMC only) and I
will transfer on to INFRA-7751 or else you can add them there yourself.


*Which (if any) of the above users need sudo ?    remark we are very
restrictive with sudo. *

The final one really comes down to anyone(s) who are willing to log in and
rarely maintain the service e.g. if Apache HTTPD needs rebooted or
something.

I've fully documented the Any23 service... documentation can be found at
the link below. These docs can be more or less copied to meet configuration
of the Tika server and service... they are essential for complete server
rebuild should anything go catastrophically wrong and we were to loose the
server and everything running on it.

https://svn.apache.org/repos/infra/infrastructure/trunk/docs/services/any23-vm.txt

I'll be keeping a close eye on the ticket and will try to drive it on. Part
of this involves getting information to JanI in a timely fashion. The info
does not need to be 100% but we at least need some people to volunteer to
maintain the service.

BTW I also have a script which we run over on Any23 as a cron job which
uses jsawk [0] to consume nightly stable SNAOSHOT's of the Any23 server...
these are then loaded in to Tomcat and replace the previous stable
SNAPSHOT. Users and Dev's alike can use the same development SNAPSHOT code
for testing. This should also allow Tika to better test new features as it
permits more users to try out new functionality, esp for parsers.

Thanks
Lewis

[0] https://github.com/micha/jsawk#readme



-- 
*Lewis*

Re: [IMPORTANT] INFRA-7751 - Create a VM for Apache Tika

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
App host name great and I would like access/sudo

Sent from my iPhone

> On May 21, 2014, at 2:39 PM, "Lewis John Mcgibbney" <le...@gmail.com> wrote:
> 
> Hi Folks,
> I've updated the ticket with the feedback given on and off list.
> Please comment on the final aspects of the ticket, these include
> * defining who wants/should have SUDO access to Tika Server
> * Confirm that tika-demo.apache.org is OK for app host name
> 
> Thanks
> Lewis
> 
> 
> On Tue, May 20, 2014 at 7:59 AM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
> 
>> Hi dev@,
>> 
>> JanI from Infra has nearly provisioned us with a brand spanking new Ubuntu
>> VM which we can use for the Tika service !!!! YAY
>> 
>> Some things he requires first though...
>> 
>> 
>> * - what is the external name used by users.   tika-vm.a.o is solely for
>> ssh, not for public *
>> 
>> This is entirely up to you guys. Over in Any23 we were lucky enough to
>> have someone on the project team own any23.org... what about
>> service.tika.apache.org? Or something else maybe?
>> 
>> 
>> 
>> *- do you intent to use https ?   if yes the traffic will go a proxy
>> server *
>> I don't think that this is required but you guys may think differently. In
>> the set up of the Any23 VM, I provisioned mod_proxy_ajp module for proxy
>> between Tomcat where the Any23 Web Application was running and incoming
>> traffic via Apache HTTPD. TIka Server is a standalone server though
>> right... it's not packaged as a war.
>> 
>> 
>> *- does your software use login   Then you must use https: *
>> 
>> AFAIK the Tika service does not have a security layer. Can someone confirm.
>> 
>> Which user should have access ?
>> 
>> It's up to you whether you want to put your name(s) here (PMC only) and I
>> will transfer on to INFRA-7751 or else you can add them there yourself.
>> 
>> 
>> *Which (if any) of the above users need sudo ?    remark we are very
>> restrictive with sudo. *
>> 
>> The final one really comes down to anyone(s) who are willing to log in and
>> rarely maintain the service e.g. if Apache HTTPD needs rebooted or
>> something.
>> 
>> I've fully documented the Any23 service... documentation can be found at
>> the link below. These docs can be more or less copied to meet configuration
>> of the Tika server and service... they are essential for complete server
>> rebuild should anything go catastrophically wrong and we were to loose the
>> server and everything running on it.
>> 
>> 
>> https://svn.apache.org/repos/infra/infrastructure/trunk/docs/services/any23-vm.txt
>> 
>> I'll be keeping a close eye on the ticket and will try to drive it on.
>> Part of this involves getting information to JanI in a timely fashion. The
>> info does not need to be 100% but we at least need some people to volunteer
>> to maintain the service.
>> 
>> BTW I also have a script which we run over on Any23 as a cron job which
>> uses jsawk [0] to consume nightly stable SNAOSHOT's of the Any23 server...
>> these are then loaded in to Tomcat and replace the previous stable
>> SNAPSHOT. Users and Dev's alike can use the same development SNAPSHOT code
>> for testing. This should also allow Tika to better test new features as it
>> permits more users to try out new functionality, esp for parsers.
>> 
>> Thanks
>> Lewis
>> 
>> [0] https://github.com/micha/jsawk#readme
>> 
>> 
>> 
>> --
>> *Lewis*
> 
> 
> 
> -- 
> *Lewis*

Re: [IMPORTANT] INFRA-7751 - Create a VM for Apache Tika

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Folks,
I've updated the ticket with the feedback given on and off list.
Please comment on the final aspects of the ticket, these include
 * defining who wants/should have SUDO access to Tika Server
 * Confirm that tika-demo.apache.org is OK for app host name

Thanks
Lewis


On Tue, May 20, 2014 at 7:59 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi dev@,
>
> JanI from Infra has nearly provisioned us with a brand spanking new Ubuntu
> VM which we can use for the Tika service !!!! YAY
>
> Some things he requires first though...
>
>
> * - what is the external name used by users.   tika-vm.a.o is solely for
> ssh, not for public *
>
> This is entirely up to you guys. Over in Any23 we were lucky enough to
> have someone on the project team own any23.org... what about
> service.tika.apache.org? Or something else maybe?
>
>
>
> *- do you intent to use https ?   if yes the traffic will go a proxy
> server *
> I don't think that this is required but you guys may think differently. In
> the set up of the Any23 VM, I provisioned mod_proxy_ajp module for proxy
> between Tomcat where the Any23 Web Application was running and incoming
> traffic via Apache HTTPD. TIka Server is a standalone server though
> right... it's not packaged as a war.
>
>
> *- does your software use login   Then you must use https: *
>
> AFAIK the Tika service does not have a security layer. Can someone confirm.
>
> Which user should have access ?
>
> It's up to you whether you want to put your name(s) here (PMC only) and I
> will transfer on to INFRA-7751 or else you can add them there yourself.
>
>
> *Which (if any) of the above users need sudo ?    remark we are very
> restrictive with sudo. *
>
> The final one really comes down to anyone(s) who are willing to log in and
> rarely maintain the service e.g. if Apache HTTPD needs rebooted or
> something.
>
> I've fully documented the Any23 service... documentation can be found at
> the link below. These docs can be more or less copied to meet configuration
> of the Tika server and service... they are essential for complete server
> rebuild should anything go catastrophically wrong and we were to loose the
> server and everything running on it.
>
>
> https://svn.apache.org/repos/infra/infrastructure/trunk/docs/services/any23-vm.txt
>
> I'll be keeping a close eye on the ticket and will try to drive it on.
> Part of this involves getting information to JanI in a timely fashion. The
> info does not need to be 100% but we at least need some people to volunteer
> to maintain the service.
>
> BTW I also have a script which we run over on Any23 as a cron job which
> uses jsawk [0] to consume nightly stable SNAOSHOT's of the Any23 server...
> these are then loaded in to Tomcat and replace the previous stable
> SNAPSHOT. Users and Dev's alike can use the same development SNAPSHOT code
> for testing. This should also allow Tika to better test new features as it
> permits more users to try out new functionality, esp for parsers.
>
> Thanks
> Lewis
>
> [0] https://github.com/micha/jsawk#readme
>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

Re: [IMPORTANT] INFRA-7751 - Create a VM for Apache Tika

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 20 May 2014, Lewis John Mcgibbney wrote:
> * - what is the external name used by users.   tika-vm.a.o is solely for
> ssh, not for public *
>
> This is entirely up to you guys. Over in Any23 we were lucky enough to have
> someone on the project team own any23.org... what about
> service.tika.apache.org? Or something else maybe?

How about demo.tika.apache.org?

> *- do you intent to use https ?   if yes the traffic will go a proxy server
> *
> I don't think that this is required but you guys may think differently. In
> the set up of the Any23 VM, I provisioned mod_proxy_ajp module for proxy
> between Tomcat where the Any23 Web Application was running and incoming
> traffic via Apache HTTPD. TIka Server is a standalone server though
> right... it's not packaged as a war.

I think there's an open request for a war version, if you fancy tackling 
that...!

SSL might be nice to have, to allow people to try Tika out with moderately 
confidential data, but it might cause problems for the *.apache.org 
wildcard cert if we use demo.tika.apache.org. I guess tika-demo.apache.org 
might work and fix that. What do others think?

> *- does your software use login   Then you must use https: *
>
> AFAIK the Tika service does not have a security layer. Can someone confirm.

It is open to everyone, there is no authentication possible

> Which user should have access ?
>
> It's up to you whether you want to put your name(s) here (PMC only) and I
> will transfer on to INFRA-7751 or else you can add them there yourself.

I'm happy to put myself forward to help with it

> *Which (if any) of the above users need sudo ?    remark we are very
> restrictive with sudo. *

As long as a normal user can bounce the Tika Server process, I don't think 
we need sudo as the rest shouldn't normally change

Nick