You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by Tom Barber <to...@analytical-labs.com> on 2016/05/19 22:31:01 UTC

Joshua Deployment with Juju

Hi guys

I figured this was worth sharing as its what I was working on whilst sat
with Lewis and Kellen at ApacheCon.

I'm looking at creating a Juju deployment for Joshua which people can
instantly attach to Hadoop to train models, but instead of using Hadoop on
a standalone mode, I want to be able to simply deploy the same code in the
cloud and scale up my training if required (I'm not a translation guy so I
don't know how that would work in real life performance, but to the sys
admin in me, it makes sense).

Anyway, I figured I'd put together a sped up and cut up demo that shows the
deployment in AWS:

https://www.youtube.com/watch?v=dnOQEVSMB-4&feature=youtu.be

This deploys Joshua 6.0.5 on its own compute node, and also a multi node
hadoop cluster (which you can scale with 1 command), and associates the
two. I need to finialise the hadoop client plumbing but should be done
early next week.

Anyway, if there is an appitite for this alongside whatever docker stuff
people are working on, I'll happily commit the charms( the code that runs
it) back to the Joshua git repo and we can maintain it in a more "official"
manner.

Tom
--------------

Director Meteorite.bi - Saiku Analytics Founder
Tel: +44(0)5603641316

(Thanks to the Saiku community we reached our Kickstart
<http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
goal, but you can always help by sponsoring the project
<http://www.meteorite.bi/products/saiku/sponsorship>)

Re: Joshua Deployment with Juju

Posted by Tom Barber <to...@analytical-labs.com>.
Oh also, with b) both me and Lewis were saying the support for multiple
language packs is pretty key.

So a user could

curl http://localhost/en/es/My%20English$20Phrase

but then on the same box do:

curl http://localhost/fr/en/Mon%20expression%20française
<http://localhost/fr/en/Mon%20expression%20fran%C3%A7aise>

That would be very useful!

--------------

Director Meteorite.bi - Saiku Analytics Founder
Tel: +44(0)5603641316

(Thanks to the Saiku community we reached our Kickstart
<http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
goal, but you can always help by sponsoring the project
<http://www.meteorite.bi/products/saiku/sponsorship>)

On 20 May 2016 at 10:23, Tom Barber <to...@analytical-labs.com> wrote:

> The idea is two fold really.
>
> a) From the docs:
>
> "If you have a Hadoop installation, make sure you’ve set $HADOOP to point
> to it. For example, if the hadoop command is in /usr/bin, you should type
>
> export HADOOP=/usr
> Joshua will find the binary and use it to submit to your hadoop cluster.
> If you don’t have one, just make sure that HADOOP is unset, and Joshua will
> roll one out for you and run it in standalone mode."
>
> So Joe User wants to train a model but doesn't want to sink their laptop
> in doing so, but similarly doesn't know how to deploy or doesn't want to go
> through the effort of deploying a multinode hadoop cluster. My
> understanding, having gone through the docs and having a chat with Lewis,
> is that Thrax will pass the job off to hadoop. So a setup like the video
> depicts would remove the need for Joshua rolling out a standalone Hadoop
> setup. Of course, I don't know how Thrax works under the hood, if it
> doesn't leverage a cluster, this is clearly not required, but as the docs
> mention the word cluster, I worked under the assumption that it did.
>
> b) If we ignore all you language geeks, consumers should be able to use
> Joshua in a variety of situations. I have the runtime version setup in
> another charm that allows users to spin it up, define a language pack to
> install, configure it and they can then chuck translations at it, again, in
> about 3 lines of code to the end user. This is like Google Translate in a
> box, but without going through the compilation rigmarole, again, something
> we should be aiming for with end users. That said, after discussing use
> cases with Lewis and seeing the talk of API's and stuff, one thing I will
> be working on in the coming months, is a web-ui for Joshua so when its spun
> up, users can just dump stuff into a box, or use CURL (I know there is some
> support there already), similarly, being able to dump Joshua into a Hadoop
> cluster for processing of data should be something we can do (we may be
> able to already, I've not looked, although the C stuff makes me wonder).
> Also being able to distribute the Joshua runtime over your cluster would be
> cool as well.
>
> Tom
>
> --------------
>
> Director Meteorite.bi - Saiku Analytics Founder
> Tel: +44(0)5603641316
>
> (Thanks to the Saiku community we reached our Kickstart
> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
> goal, but you can always help by sponsoring the project
> <http://www.meteorite.bi/products/saiku/sponsorship>)
>
> On 20 May 2016 at 10:13, kellen sunderland <ke...@gmail.com>
> wrote:
>
>> Hey Tom, nice work.  I'll take a closer look soon but just had a question
>> about the use case.  Would the idea be that you could use Joshua to
>> translate text in a map during a hadoop job?
>>
>> -Kellen
>>
>> On Fri, May 20, 2016 at 12:31 AM, Tom Barber <to...@analytical-labs.com>
>> wrote:
>>
>> > Hi guys
>> >
>> > I figured this was worth sharing as its what I was working on whilst sat
>> > with Lewis and Kellen at ApacheCon.
>> >
>> > I'm looking at creating a Juju deployment for Joshua which people can
>> > instantly attach to Hadoop to train models, but instead of using Hadoop
>> on
>> > a standalone mode, I want to be able to simply deploy the same code in
>> the
>> > cloud and scale up my training if required (I'm not a translation guy
>> so I
>> > don't know how that would work in real life performance, but to the sys
>> > admin in me, it makes sense).
>> >
>> > Anyway, I figured I'd put together a sped up and cut up demo that shows
>> the
>> > deployment in AWS:
>> >
>> > https://www.youtube.com/watch?v=dnOQEVSMB-4&feature=youtu.be
>> >
>> > This deploys Joshua 6.0.5 on its own compute node, and also a multi node
>> > hadoop cluster (which you can scale with 1 command), and associates the
>> > two. I need to finialise the hadoop client plumbing but should be done
>> > early next week.
>> >
>> > Anyway, if there is an appitite for this alongside whatever docker stuff
>> > people are working on, I'll happily commit the charms( the code that
>> runs
>> > it) back to the Joshua git repo and we can maintain it in a more
>> "official"
>> > manner.
>> >
>> > Tom
>> > --------------
>> >
>> > Director Meteorite.bi - Saiku Analytics Founder
>> > Tel: +44(0)5603641316
>> >
>> > (Thanks to the Saiku community we reached our Kickstart
>> > <
>> >
>> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/
>> > >
>> > goal, but you can always help by sponsoring the project
>> > <http://www.meteorite.bi/products/saiku/sponsorship>)
>> >
>>
>
>

Re: Joshua Deployment with Juju

Posted by Tom Barber <to...@analytical-labs.com>.
The idea is two fold really.

a) From the docs:

"If you have a Hadoop installation, make sure you’ve set $HADOOP to point
to it. For example, if the hadoop command is in /usr/bin, you should type

export HADOOP=/usr
Joshua will find the binary and use it to submit to your hadoop cluster. If
you don’t have one, just make sure that HADOOP is unset, and Joshua will
roll one out for you and run it in standalone mode."

So Joe User wants to train a model but doesn't want to sink their laptop in
doing so, but similarly doesn't know how to deploy or doesn't want to go
through the effort of deploying a multinode hadoop cluster. My
understanding, having gone through the docs and having a chat with Lewis,
is that Thrax will pass the job off to hadoop. So a setup like the video
depicts would remove the need for Joshua rolling out a standalone Hadoop
setup. Of course, I don't know how Thrax works under the hood, if it
doesn't leverage a cluster, this is clearly not required, but as the docs
mention the word cluster, I worked under the assumption that it did.

b) If we ignore all you language geeks, consumers should be able to use
Joshua in a variety of situations. I have the runtime version setup in
another charm that allows users to spin it up, define a language pack to
install, configure it and they can then chuck translations at it, again, in
about 3 lines of code to the end user. This is like Google Translate in a
box, but without going through the compilation rigmarole, again, something
we should be aiming for with end users. That said, after discussing use
cases with Lewis and seeing the talk of API's and stuff, one thing I will
be working on in the coming months, is a web-ui for Joshua so when its spun
up, users can just dump stuff into a box, or use CURL (I know there is some
support there already), similarly, being able to dump Joshua into a Hadoop
cluster for processing of data should be something we can do (we may be
able to already, I've not looked, although the C stuff makes me wonder).
Also being able to distribute the Joshua runtime over your cluster would be
cool as well.

Tom

--------------

Director Meteorite.bi - Saiku Analytics Founder
Tel: +44(0)5603641316

(Thanks to the Saiku community we reached our Kickstart
<http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
goal, but you can always help by sponsoring the project
<http://www.meteorite.bi/products/saiku/sponsorship>)

On 20 May 2016 at 10:13, kellen sunderland <ke...@gmail.com>
wrote:

> Hey Tom, nice work.  I'll take a closer look soon but just had a question
> about the use case.  Would the idea be that you could use Joshua to
> translate text in a map during a hadoop job?
>
> -Kellen
>
> On Fri, May 20, 2016 at 12:31 AM, Tom Barber <to...@analytical-labs.com>
> wrote:
>
> > Hi guys
> >
> > I figured this was worth sharing as its what I was working on whilst sat
> > with Lewis and Kellen at ApacheCon.
> >
> > I'm looking at creating a Juju deployment for Joshua which people can
> > instantly attach to Hadoop to train models, but instead of using Hadoop
> on
> > a standalone mode, I want to be able to simply deploy the same code in
> the
> > cloud and scale up my training if required (I'm not a translation guy so
> I
> > don't know how that would work in real life performance, but to the sys
> > admin in me, it makes sense).
> >
> > Anyway, I figured I'd put together a sped up and cut up demo that shows
> the
> > deployment in AWS:
> >
> > https://www.youtube.com/watch?v=dnOQEVSMB-4&feature=youtu.be
> >
> > This deploys Joshua 6.0.5 on its own compute node, and also a multi node
> > hadoop cluster (which you can scale with 1 command), and associates the
> > two. I need to finialise the hadoop client plumbing but should be done
> > early next week.
> >
> > Anyway, if there is an appitite for this alongside whatever docker stuff
> > people are working on, I'll happily commit the charms( the code that runs
> > it) back to the Joshua git repo and we can maintain it in a more
> "official"
> > manner.
> >
> > Tom
> > --------------
> >
> > Director Meteorite.bi - Saiku Analytics Founder
> > Tel: +44(0)5603641316
> >
> > (Thanks to the Saiku community we reached our Kickstart
> > <
> >
> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/
> > >
> > goal, but you can always help by sponsoring the project
> > <http://www.meteorite.bi/products/saiku/sponsorship>)
> >
>

Re: Joshua Deployment with Juju

Posted by kellen sunderland <ke...@gmail.com>.
Hey Tom, nice work.  I'll take a closer look soon but just had a question
about the use case.  Would the idea be that you could use Joshua to
translate text in a map during a hadoop job?

-Kellen

On Fri, May 20, 2016 at 12:31 AM, Tom Barber <to...@analytical-labs.com>
wrote:

> Hi guys
>
> I figured this was worth sharing as its what I was working on whilst sat
> with Lewis and Kellen at ApacheCon.
>
> I'm looking at creating a Juju deployment for Joshua which people can
> instantly attach to Hadoop to train models, but instead of using Hadoop on
> a standalone mode, I want to be able to simply deploy the same code in the
> cloud and scale up my training if required (I'm not a translation guy so I
> don't know how that would work in real life performance, but to the sys
> admin in me, it makes sense).
>
> Anyway, I figured I'd put together a sped up and cut up demo that shows the
> deployment in AWS:
>
> https://www.youtube.com/watch?v=dnOQEVSMB-4&feature=youtu.be
>
> This deploys Joshua 6.0.5 on its own compute node, and also a multi node
> hadoop cluster (which you can scale with 1 command), and associates the
> two. I need to finialise the hadoop client plumbing but should be done
> early next week.
>
> Anyway, if there is an appitite for this alongside whatever docker stuff
> people are working on, I'll happily commit the charms( the code that runs
> it) back to the Joshua git repo and we can maintain it in a more "official"
> manner.
>
> Tom
> --------------
>
> Director Meteorite.bi - Saiku Analytics Founder
> Tel: +44(0)5603641316
>
> (Thanks to the Saiku community we reached our Kickstart
> <
> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/
> >
> goal, but you can always help by sponsoring the project
> <http://www.meteorite.bi/products/saiku/sponsorship>)
>