You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Ankit Goel <an...@gmail.com> on 2015/07/23 02:53:04 UTC

Mahout on the cloud

Hi,
After my runs on my lappy, I'm ready to port my work to the cloud. Planning
to use Amazon. One thing I noticed when I started with mahout, that there
were a lot of things unsaid on the site/wiki and took me a lot of time to
figure out. Pitfalls if I may call them. I will primarily be using
clustering on the cloud, so the code to accept new data and run it is what
I have for now.

So before I port to the cloud, are there any things I should beware of or
lookout for? Like is AWS fine with mahout? Are there any configurations I
should remember? Any advice on implementation to ease my transition and run
mahout 24hrs? Thanks

-- 
Regards,
Ankit Goel
http://about.me/ankitgoel

Re: Mahout on the cloud

Posted by Jay Vyas <ja...@gmail.com>.

An aside: You can also deploy mahout via asfbiftop in emr or openstack if you are interested in building your own distribution or patching but still want the convenience of automated deployment and rpm/deb packages.

Boston university is going this route currently.

> On Jul 23, 2015, at 10:45 AM, Andrew Musselman <an...@gmail.com> wrote:
> 
> I'd recommend using EMR since you can create a cluster with one command and
> shut it down when you're finished.
> 
> See
> https://blogs.aws.amazon.com/bigdata/post/Tx1TDK3HHBD4EZL/Building-a-Recommender-with-Apache-Mahout-on-Amazon-Elastic-MapReduce-EMR
> 
> That shows how to start up your cluster and run a recommender which you can
> tailor for what you're doing.  It's worth looking for the most recent AMI
> version, which may already come with 0.10 of Mahout.
> 
>> On Thursday, July 23, 2015, Ankit Goel <an...@gmail.com> wrote:
>> 
>> Thanks for the heads up Dmitriy..thats exactly the kind of warning I was
>> looking for. I dont have any experience implementing MR yet --i understand
>> the algo perfectly-- so this is a great heads up. Any advice oor warnings
>> on hadoop installations and versions??
>> 
>> On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
>> <javascript:;>> wrote:
>> 
>>> MapReduce things enter de-facto end-of-life. Not that we specifically
>> don't
>>> want to support them, it is de-facto nobody bothers to support them --
>>> especially risks are high with new versions of hadoop and EMR.
>>> 
>>> That said, we'd be grateful for any guide about doing this in EMR.
>>> 
>>> On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <ankitgoel2004@gmail.com
>> <javascript:;>>
>>> wrote:
>>> 
>>>> Hi,
>>>> After my runs on my lappy, I'm ready to port my work to the cloud.
>>> Planning
>>>> to use Amazon. One thing I noticed when I started with mahout, that
>> there
>>>> were a lot of things unsaid on the site/wiki and took me a lot of time
>> to
>>>> figure out. Pitfalls if I may call them. I will primarily be using
>>>> clustering on the cloud, so the code to accept new data and run it is
>>> what
>>>> I have for now.
>>>> 
>>>> So before I port to the cloud, are there any things I should beware of
>> or
>>>> lookout for? Like is AWS fine with mahout? Are there any
>> configurations I
>>>> should remember? Any advice on implementation to ease my transition and
>>> run
>>>> mahout 24hrs? Thanks
>>>> 
>>>> --
>>>> Regards,
>>>> Ankit Goel
>>>> http://about.me/ankitgoel
>> 
>> 
>> 
>> --
>> Regards,
>> Ankit Goel
>> http://about.me/ankitgoel
>>

Re: Mahout on the cloud

Posted by Andrew Musselman <an...@gmail.com>.

I'd recommend using EMR since you can create a cluster with one command and
shut it down when you're finished.

See
https://blogs.aws.amazon.com/bigdata/post/Tx1TDK3HHBD4EZL/Building-a-Recommender-with-Apache-Mahout-on-Amazon-Elastic-MapReduce-EMR

That shows how to start up your cluster and run a recommender which you can
tailor for what you're doing.  It's worth looking for the most recent AMI
version, which may already come with 0.10 of Mahout.

On Thursday, July 23, 2015, Ankit Goel <an...@gmail.com> wrote:

> Thanks for the heads up Dmitriy..thats exactly the kind of warning I was
> looking for. I dont have any experience implementing MR yet --i understand
> the algo perfectly-- so this is a great heads up. Any advice oor warnings
> on hadoop installations and versions??
>
> On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
> <javascript:;>> wrote:
>
> > MapReduce things enter de-facto end-of-life. Not that we specifically
> don't
> > want to support them, it is de-facto nobody bothers to support them --
> > especially risks are high with new versions of hadoop and EMR.
> >
> > That said, we'd be grateful for any guide about doing this in EMR.
> >
> > On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <ankitgoel2004@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi,
> > > After my runs on my lappy, I'm ready to port my work to the cloud.
> > Planning
> > > to use Amazon. One thing I noticed when I started with mahout, that
> there
> > > were a lot of things unsaid on the site/wiki and took me a lot of time
> to
> > > figure out. Pitfalls if I may call them. I will primarily be using
> > > clustering on the cloud, so the code to accept new data and run it is
> > what
> > > I have for now.
> > >
> > > So before I port to the cloud, are there any things I should beware of
> or
> > > lookout for? Like is AWS fine with mahout? Are there any
> configurations I
> > > should remember? Any advice on implementation to ease my transition and
> > run
> > > mahout 24hrs? Thanks
> > >
> > > --
> > > Regards,
> > > Ankit Goel
> > > http://about.me/ankitgoel
> > >
> >
>
>
>
> --
> Regards,
> Ankit Goel
> http://about.me/ankitgoel
>

Re: Mahout on the cloud

Posted by Ankit Goel <an...@gmail.com>.

very true. java has a longer history and enough resources and ides. thanks
for this bit of information, and ofcourse like Pat mentions, even if mahout
is a scala project, we will still find java apis to work with. I'm just
reiterating this point for anyone else who might come across this thread
while looking for similar answers.

On Fri, Jul 24, 2015 at 10:05 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> For the foreseeable future we are a Scala project but like Spark itself
> Java APIs can often be created for Scala given the right API design and if
> someone wants to contribute in this area it would be seen favorably I
> think. Java knowledge still far easier to find than Scala.
>
> On Jul 23, 2015, at 2:52 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
> PPS. one of "better" backends, if there any comparison really is
> appropriate, is expected to be Apache Flink.
>
> On Thu, Jul 23, 2015 at 2:51 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > i guess i was a bit vague. by quasi-agnostic i mean that some code, the
> > smaller part of it, may include specific backend engine dependencies
> > unfortunately. it should be easily portable though.
> >
> >
> > On Thu, Jul 23, 2015 at 2:50 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> >> Mahout is moving to be backend-agnostic. Supports same code on spark or
> >> h20.
> >>
> >> (Disclaimer: some code is quasi-agnostic, such as spark shell, or I
> think
> >> some co-occurrence drivers also like Spark more than anything else. may
> be
> >> wrong.)
> >>
> >>
> >> On Thu, Jul 23, 2015 at 2:41 PM, Ankit Goel <an...@gmail.com>
> >> wrote:
> >>
> >>> Thanks a lot guys.
> >>> @Pat is mahout only going to support scala in the near future? and will
> >>> all
> >>> the ml libraries only be from spark? I did read somewhere that mahout
> was
> >>> heading towards a direction where its more of a framework that supports
> >>> multiple ml libraries. Am I right in my understanding?
> >>>
> >>> On Thu, Jul 23, 2015 at 10:03 PM, Pat Ferrel <pa...@occamsmachete.com>
> >>> wrote:
> >>>
> >>>> Just to be clear, mahout runs on AWS just fine. Dmitriy is talking
> >>> about
> >>>> support and continuance of “MapReduce” which means Hadoop MapReduce.
> We
> >>>> have been exclusively accepting only more modern engine code for more
> >>> than
> >>>> a year so most of the modern Mahout is in Scala and runs on Spark. The
> >>>> MapReduce paradigm is certainly supported there but it runs on Spark
> >>> so any
> >>>> EMR instances you create should have Spark installed.
> >>>>
> >>>> Amazon now supports Spark on EMR:
> >>>> https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
> >>>>
> >>>> Make sure you use the correct version of Spark with Mahout. 0.10.0
> >>>> supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or
> >>> less,
> >>>> the current master snapshot supports Spark 1.3 and runs on Spark 1.4.
> >>>>
> >>>> On Jul 23, 2015, at 7:28 AM, Ankit Goel <an...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Thanks for the heads up Dmitriy..thats exactly the kind of warning I
> >>> was
> >>>> looking for. I dont have any experience implementing MR yet --i
> >>> understand
> >>>> the algo perfectly-- so this is a great heads up. Any advice oor
> >>> warnings
> >>>> on hadoop installations and versions??
> >>>>
> >>>> On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dl...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> MapReduce things enter de-facto end-of-life. Not that we specifically
> >>>> don't
> >>>>> want to support them, it is de-facto nobody bothers to support them
> >>> --
> >>>>> especially risks are high with new versions of hadoop and EMR.
> >>>>>
> >>>>> That said, we'd be grateful for any guide about doing this in EMR.
> >>>>>
> >>>>> On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <ankitgoel2004@gmail.com
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>> After my runs on my lappy, I'm ready to port my work to the cloud.
> >>>>> Planning
> >>>>>> to use Amazon. One thing I noticed when I started with mahout, that
> >>>> there
> >>>>>> were a lot of things unsaid on the site/wiki and took me a lot of
> >>> time
> >>>> to
> >>>>>> figure out. Pitfalls if I may call them. I will primarily be using
> >>>>>> clustering on the cloud, so the code to accept new data and run it
> >>> is
> >>>>> what
> >>>>>> I have for now.
> >>>>>>
> >>>>>> So before I port to the cloud, are there any things I should beware
> >>> of
> >>>> or
> >>>>>> lookout for? Like is AWS fine with mahout? Are there any
> >>> configurations
> >>>> I
> >>>>>> should remember? Any advice on implementation to ease my transition
> >>> and
> >>>>> run
> >>>>>> mahout 24hrs? Thanks
> >>>>>>
> >>>>>> --
> >>>>>> Regards,
> >>>>>> Ankit Goel
> >>>>>> http://about.me/ankitgoel
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Regards,
> >>>> Ankit Goel
> >>>> http://about.me/ankitgoel
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Ankit Goel
> >>> http://about.me/ankitgoel
> >>>
> >>
> >>
> >
>
>


-- 
Regards,
Ankit Goel
http://about.me/ankitgoel

Re: Mahout on the cloud

Posted by Pat Ferrel <pa...@occamsmachete.com>.

For the foreseeable future we are a Scala project but like Spark itself Java APIs can often be created for Scala given the right API design and if someone wants to contribute in this area it would be seen favorably I think. Java knowledge still far easier to find than Scala.

On Jul 23, 2015, at 2:52 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

PPS. one of "better" backends, if there any comparison really is
appropriate, is expected to be Apache Flink.

On Thu, Jul 23, 2015 at 2:51 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> i guess i was a bit vague. by quasi-agnostic i mean that some code, the
> smaller part of it, may include specific backend engine dependencies
> unfortunately. it should be easily portable though.
> 
> 
> On Thu, Jul 23, 2015 at 2:50 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> 
>> Mahout is moving to be backend-agnostic. Supports same code on spark or
>> h20.
>> 
>> (Disclaimer: some code is quasi-agnostic, such as spark shell, or I think
>> some co-occurrence drivers also like Spark more than anything else. may be
>> wrong.)
>> 
>> 
>> On Thu, Jul 23, 2015 at 2:41 PM, Ankit Goel <an...@gmail.com>
>> wrote:
>> 
>>> Thanks a lot guys.
>>> @Pat is mahout only going to support scala in the near future? and will
>>> all
>>> the ml libraries only be from spark? I did read somewhere that mahout was
>>> heading towards a direction where its more of a framework that supports
>>> multiple ml libraries. Am I right in my understanding?
>>> 
>>> On Thu, Jul 23, 2015 at 10:03 PM, Pat Ferrel <pa...@occamsmachete.com>
>>> wrote:
>>> 
>>>> Just to be clear, mahout runs on AWS just fine. Dmitriy is talking
>>> about
>>>> support and continuance of “MapReduce” which means Hadoop MapReduce. We
>>>> have been exclusively accepting only more modern engine code for more
>>> than
>>>> a year so most of the modern Mahout is in Scala and runs on Spark. The
>>>> MapReduce paradigm is certainly supported there but it runs on Spark
>>> so any
>>>> EMR instances you create should have Spark installed.
>>>> 
>>>> Amazon now supports Spark on EMR:
>>>> https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
>>>> 
>>>> Make sure you use the correct version of Spark with Mahout. 0.10.0
>>>> supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or
>>> less,
>>>> the current master snapshot supports Spark 1.3 and runs on Spark 1.4.
>>>> 
>>>> On Jul 23, 2015, at 7:28 AM, Ankit Goel <an...@gmail.com>
>>> wrote:
>>>> 
>>>> Thanks for the heads up Dmitriy..thats exactly the kind of warning I
>>> was
>>>> looking for. I dont have any experience implementing MR yet --i
>>> understand
>>>> the algo perfectly-- so this is a great heads up. Any advice oor
>>> warnings
>>>> on hadoop installations and versions??
>>>> 
>>>> On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dl...@gmail.com>
>>>> wrote:
>>>> 
>>>>> MapReduce things enter de-facto end-of-life. Not that we specifically
>>>> don't
>>>>> want to support them, it is de-facto nobody bothers to support them
>>> --
>>>>> especially risks are high with new versions of hadoop and EMR.
>>>>> 
>>>>> That said, we'd be grateful for any guide about doing this in EMR.
>>>>> 
>>>>> On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <ankitgoel2004@gmail.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> After my runs on my lappy, I'm ready to port my work to the cloud.
>>>>> Planning
>>>>>> to use Amazon. One thing I noticed when I started with mahout, that
>>>> there
>>>>>> were a lot of things unsaid on the site/wiki and took me a lot of
>>> time
>>>> to
>>>>>> figure out. Pitfalls if I may call them. I will primarily be using
>>>>>> clustering on the cloud, so the code to accept new data and run it
>>> is
>>>>> what
>>>>>> I have for now.
>>>>>> 
>>>>>> So before I port to the cloud, are there any things I should beware
>>> of
>>>> or
>>>>>> lookout for? Like is AWS fine with mahout? Are there any
>>> configurations
>>>> I
>>>>>> should remember? Any advice on implementation to ease my transition
>>> and
>>>>> run
>>>>>> mahout 24hrs? Thanks
>>>>>> 
>>>>>> --
>>>>>> Regards,
>>>>>> Ankit Goel
>>>>>> http://about.me/ankitgoel
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Ankit Goel
>>>> http://about.me/ankitgoel
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Ankit Goel
>>> http://about.me/ankitgoel
>>> 
>> 
>> 
>

Re: Mahout on the cloud

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

PPS. one of "better" backends, if there any comparison really is
appropriate, is expected to be Apache Flink.

On Thu, Jul 23, 2015 at 2:51 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> i guess i was a bit vague. by quasi-agnostic i mean that some code, the
> smaller part of it, may include specific backend engine dependencies
> unfortunately. it should be easily portable though.
>
>
> On Thu, Jul 23, 2015 at 2:50 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
>> Mahout is moving to be backend-agnostic. Supports same code on spark or
>> h20.
>>
>> (Disclaimer: some code is quasi-agnostic, such as spark shell, or I think
>> some co-occurrence drivers also like Spark more than anything else. may be
>> wrong.)
>>
>>
>> On Thu, Jul 23, 2015 at 2:41 PM, Ankit Goel <an...@gmail.com>
>> wrote:
>>
>>> Thanks a lot guys.
>>> @Pat is mahout only going to support scala in the near future? and will
>>> all
>>> the ml libraries only be from spark? I did read somewhere that mahout was
>>> heading towards a direction where its more of a framework that supports
>>> multiple ml libraries. Am I right in my understanding?
>>>
>>> On Thu, Jul 23, 2015 at 10:03 PM, Pat Ferrel <pa...@occamsmachete.com>
>>> wrote:
>>>
>>> > Just to be clear, mahout runs on AWS just fine. Dmitriy is talking
>>> about
>>> > support and continuance of “MapReduce” which means Hadoop MapReduce. We
>>> > have been exclusively accepting only more modern engine code for more
>>> than
>>> > a year so most of the modern Mahout is in Scala and runs on Spark. The
>>> > MapReduce paradigm is certainly supported there but it runs on Spark
>>> so any
>>> > EMR instances you create should have Spark installed.
>>> >
>>> > Amazon now supports Spark on EMR:
>>> > https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
>>> >
>>> > Make sure you use the correct version of Spark with Mahout. 0.10.0
>>> > supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or
>>> less,
>>> > the current master snapshot supports Spark 1.3 and runs on Spark 1.4.
>>> >
>>> > On Jul 23, 2015, at 7:28 AM, Ankit Goel <an...@gmail.com>
>>> wrote:
>>> >
>>> > Thanks for the heads up Dmitriy..thats exactly the kind of warning I
>>> was
>>> > looking for. I dont have any experience implementing MR yet --i
>>> understand
>>> > the algo perfectly-- so this is a great heads up. Any advice oor
>>> warnings
>>> > on hadoop installations and versions??
>>> >
>>> > On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dl...@gmail.com>
>>> > wrote:
>>> >
>>> > > MapReduce things enter de-facto end-of-life. Not that we specifically
>>> > don't
>>> > > want to support them, it is de-facto nobody bothers to support them
>>> --
>>> > > especially risks are high with new versions of hadoop and EMR.
>>> > >
>>> > > That said, we'd be grateful for any guide about doing this in EMR.
>>> > >
>>> > > On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <ankitgoel2004@gmail.com
>>> >
>>> > > wrote:
>>> > >
>>> > >> Hi,
>>> > >> After my runs on my lappy, I'm ready to port my work to the cloud.
>>> > > Planning
>>> > >> to use Amazon. One thing I noticed when I started with mahout, that
>>> > there
>>> > >> were a lot of things unsaid on the site/wiki and took me a lot of
>>> time
>>> > to
>>> > >> figure out. Pitfalls if I may call them. I will primarily be using
>>> > >> clustering on the cloud, so the code to accept new data and run it
>>> is
>>> > > what
>>> > >> I have for now.
>>> > >>
>>> > >> So before I port to the cloud, are there any things I should beware
>>> of
>>> > or
>>> > >> lookout for? Like is AWS fine with mahout? Are there any
>>> configurations
>>> > I
>>> > >> should remember? Any advice on implementation to ease my transition
>>> and
>>> > > run
>>> > >> mahout 24hrs? Thanks
>>> > >>
>>> > >> --
>>> > >> Regards,
>>> > >> Ankit Goel
>>> > >> http://about.me/ankitgoel
>>> > >>
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Regards,
>>> > Ankit Goel
>>> > http://about.me/ankitgoel
>>> >
>>> >
>>>
>>>
>>> --
>>> Regards,
>>> Ankit Goel
>>> http://about.me/ankitgoel
>>>
>>
>>
>

Re: Mahout on the cloud

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

i guess i was a bit vague. by quasi-agnostic i mean that some code, the
smaller part of it, may include specific backend engine dependencies
unfortunately. it should be easily portable though.

On Thu, Jul 23, 2015 at 2:50 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Mahout is moving to be backend-agnostic. Supports same code on spark or
> h20.
>
> (Disclaimer: some code is quasi-agnostic, such as spark shell, or I think
> some co-occurrence drivers also like Spark more than anything else. may be
> wrong.)
>
>
> On Thu, Jul 23, 2015 at 2:41 PM, Ankit Goel <an...@gmail.com>
> wrote:
>
>> Thanks a lot guys.
>> @Pat is mahout only going to support scala in the near future? and will
>> all
>> the ml libraries only be from spark? I did read somewhere that mahout was
>> heading towards a direction where its more of a framework that supports
>> multiple ml libraries. Am I right in my understanding?
>>
>> On Thu, Jul 23, 2015 at 10:03 PM, Pat Ferrel <pa...@occamsmachete.com>
>> wrote:
>>
>> > Just to be clear, mahout runs on AWS just fine. Dmitriy is talking about
>> > support and continuance of “MapReduce” which means Hadoop MapReduce. We
>> > have been exclusively accepting only more modern engine code for more
>> than
>> > a year so most of the modern Mahout is in Scala and runs on Spark. The
>> > MapReduce paradigm is certainly supported there but it runs on Spark so
>> any
>> > EMR instances you create should have Spark installed.
>> >
>> > Amazon now supports Spark on EMR:
>> > https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
>> >
>> > Make sure you use the correct version of Spark with Mahout. 0.10.0
>> > supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or
>> less,
>> > the current master snapshot supports Spark 1.3 and runs on Spark 1.4.
>> >
>> > On Jul 23, 2015, at 7:28 AM, Ankit Goel <an...@gmail.com>
>> wrote:
>> >
>> > Thanks for the heads up Dmitriy..thats exactly the kind of warning I was
>> > looking for. I dont have any experience implementing MR yet --i
>> understand
>> > the algo perfectly-- so this is a great heads up. Any advice oor
>> warnings
>> > on hadoop installations and versions??
>> >
>> > On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dl...@gmail.com>
>> > wrote:
>> >
>> > > MapReduce things enter de-facto end-of-life. Not that we specifically
>> > don't
>> > > want to support them, it is de-facto nobody bothers to support them --
>> > > especially risks are high with new versions of hadoop and EMR.
>> > >
>> > > That said, we'd be grateful for any guide about doing this in EMR.
>> > >
>> > > On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <an...@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi,
>> > >> After my runs on my lappy, I'm ready to port my work to the cloud.
>> > > Planning
>> > >> to use Amazon. One thing I noticed when I started with mahout, that
>> > there
>> > >> were a lot of things unsaid on the site/wiki and took me a lot of
>> time
>> > to
>> > >> figure out. Pitfalls if I may call them. I will primarily be using
>> > >> clustering on the cloud, so the code to accept new data and run it is
>> > > what
>> > >> I have for now.
>> > >>
>> > >> So before I port to the cloud, are there any things I should beware
>> of
>> > or
>> > >> lookout for? Like is AWS fine with mahout? Are there any
>> configurations
>> > I
>> > >> should remember? Any advice on implementation to ease my transition
>> and
>> > > run
>> > >> mahout 24hrs? Thanks
>> > >>
>> > >> --
>> > >> Regards,
>> > >> Ankit Goel
>> > >> http://about.me/ankitgoel
>> > >>
>> > >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Ankit Goel
>> > http://about.me/ankitgoel
>> >
>> >
>>
>>
>> --
>> Regards,
>> Ankit Goel
>> http://about.me/ankitgoel
>>
>
>

Re: Mahout on the cloud

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Mahout is moving to be backend-agnostic. Supports same code on spark or
h20.

(Disclaimer: some code is quasi-agnostic, such as spark shell, or I think
some co-occurrence drivers also like Spark more than anything else. may be
wrong.)


On Thu, Jul 23, 2015 at 2:41 PM, Ankit Goel <an...@gmail.com> wrote:

> Thanks a lot guys.
> @Pat is mahout only going to support scala in the near future? and will all
> the ml libraries only be from spark? I did read somewhere that mahout was
> heading towards a direction where its more of a framework that supports
> multiple ml libraries. Am I right in my understanding?
>
> On Thu, Jul 23, 2015 at 10:03 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>
> > Just to be clear, mahout runs on AWS just fine. Dmitriy is talking about
> > support and continuance of “MapReduce” which means Hadoop MapReduce. We
> > have been exclusively accepting only more modern engine code for more
> than
> > a year so most of the modern Mahout is in Scala and runs on Spark. The
> > MapReduce paradigm is certainly supported there but it runs on Spark so
> any
> > EMR instances you create should have Spark installed.
> >
> > Amazon now supports Spark on EMR:
> > https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
> >
> > Make sure you use the correct version of Spark with Mahout. 0.10.0
> > supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or less,
> > the current master snapshot supports Spark 1.3 and runs on Spark 1.4.
> >
> > On Jul 23, 2015, at 7:28 AM, Ankit Goel <an...@gmail.com> wrote:
> >
> > Thanks for the heads up Dmitriy..thats exactly the kind of warning I was
> > looking for. I dont have any experience implementing MR yet --i
> understand
> > the algo perfectly-- so this is a great heads up. Any advice oor warnings
> > on hadoop installations and versions??
> >
> > On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > MapReduce things enter de-facto end-of-life. Not that we specifically
> > don't
> > > want to support them, it is de-facto nobody bothers to support them --
> > > especially risks are high with new versions of hadoop and EMR.
> > >
> > > That said, we'd be grateful for any guide about doing this in EMR.
> > >
> > > On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <an...@gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >> After my runs on my lappy, I'm ready to port my work to the cloud.
> > > Planning
> > >> to use Amazon. One thing I noticed when I started with mahout, that
> > there
> > >> were a lot of things unsaid on the site/wiki and took me a lot of time
> > to
> > >> figure out. Pitfalls if I may call them. I will primarily be using
> > >> clustering on the cloud, so the code to accept new data and run it is
> > > what
> > >> I have for now.
> > >>
> > >> So before I port to the cloud, are there any things I should beware of
> > or
> > >> lookout for? Like is AWS fine with mahout? Are there any
> configurations
> > I
> > >> should remember? Any advice on implementation to ease my transition
> and
> > > run
> > >> mahout 24hrs? Thanks
> > >>
> > >> --
> > >> Regards,
> > >> Ankit Goel
> > >> http://about.me/ankitgoel
> > >>
> > >
> >
> >
> >
> > --
> > Regards,
> > Ankit Goel
> > http://about.me/ankitgoel
> >
> >
>
>
> --
> Regards,
> Ankit Goel
> http://about.me/ankitgoel
>

Re: Mahout on the cloud

Posted by Ankit Goel <an...@gmail.com>.

Thanks a lot guys.
@Pat is mahout only going to support scala in the near future? and will all
the ml libraries only be from spark? I did read somewhere that mahout was
heading towards a direction where its more of a framework that supports
multiple ml libraries. Am I right in my understanding?

On Thu, Jul 23, 2015 at 10:03 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Just to be clear, mahout runs on AWS just fine. Dmitriy is talking about
> support and continuance of “MapReduce” which means Hadoop MapReduce. We
> have been exclusively accepting only more modern engine code for more than
> a year so most of the modern Mahout is in Scala and runs on Spark. The
> MapReduce paradigm is certainly supported there but it runs on Spark so any
> EMR instances you create should have Spark installed.
>
> Amazon now supports Spark on EMR:
> https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
>
> Make sure you use the correct version of Spark with Mahout. 0.10.0
> supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or less,
> the current master snapshot supports Spark 1.3 and runs on Spark 1.4.
>
> On Jul 23, 2015, at 7:28 AM, Ankit Goel <an...@gmail.com> wrote:
>
> Thanks for the heads up Dmitriy..thats exactly the kind of warning I was
> looking for. I dont have any experience implementing MR yet --i understand
> the algo perfectly-- so this is a great heads up. Any advice oor warnings
> on hadoop installations and versions??
>
> On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > MapReduce things enter de-facto end-of-life. Not that we specifically
> don't
> > want to support them, it is de-facto nobody bothers to support them --
> > especially risks are high with new versions of hadoop and EMR.
> >
> > That said, we'd be grateful for any guide about doing this in EMR.
> >
> > On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <an...@gmail.com>
> > wrote:
> >
> >> Hi,
> >> After my runs on my lappy, I'm ready to port my work to the cloud.
> > Planning
> >> to use Amazon. One thing I noticed when I started with mahout, that
> there
> >> were a lot of things unsaid on the site/wiki and took me a lot of time
> to
> >> figure out. Pitfalls if I may call them. I will primarily be using
> >> clustering on the cloud, so the code to accept new data and run it is
> > what
> >> I have for now.
> >>
> >> So before I port to the cloud, are there any things I should beware of
> or
> >> lookout for? Like is AWS fine with mahout? Are there any configurations
> I
> >> should remember? Any advice on implementation to ease my transition and
> > run
> >> mahout 24hrs? Thanks
> >>
> >> --
> >> Regards,
> >> Ankit Goel
> >> http://about.me/ankitgoel
> >>
> >
>
>
>
> --
> Regards,
> Ankit Goel
> http://about.me/ankitgoel
>
>


-- 
Regards,
Ankit Goel
http://about.me/ankitgoel

Re: Mahout on the cloud

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Just to be clear, mahout runs on AWS just fine. Dmitriy is talking about support and continuance of “MapReduce” which means Hadoop MapReduce. We have been exclusively accepting only more modern engine code for more than a year so most of the modern Mahout is in Scala and runs on Spark. The MapReduce paradigm is certainly supported there but it runs on Spark so any EMR instances you create should have Spark installed.

Amazon now supports Spark on EMR: https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/

Make sure you use the correct version of Spark with Mahout. 0.10.0 supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or less, the current master snapshot supports Spark 1.3 and runs on Spark 1.4.

On Jul 23, 2015, at 7:28 AM, Ankit Goel <an...@gmail.com> wrote:

Thanks for the heads up Dmitriy..thats exactly the kind of warning I was
looking for. I dont have any experience implementing MR yet --i understand
the algo perfectly-- so this is a great heads up. Any advice oor warnings
on hadoop installations and versions??

On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> MapReduce things enter de-facto end-of-life. Not that we specifically don't
> want to support them, it is de-facto nobody bothers to support them --
> especially risks are high with new versions of hadoop and EMR.
> 
> That said, we'd be grateful for any guide about doing this in EMR.
> 
> On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <an...@gmail.com>
> wrote:
> 
>> Hi,
>> After my runs on my lappy, I'm ready to port my work to the cloud.
> Planning
>> to use Amazon. One thing I noticed when I started with mahout, that there
>> were a lot of things unsaid on the site/wiki and took me a lot of time to
>> figure out. Pitfalls if I may call them. I will primarily be using
>> clustering on the cloud, so the code to accept new data and run it is
> what
>> I have for now.
>> 
>> So before I port to the cloud, are there any things I should beware of or
>> lookout for? Like is AWS fine with mahout? Are there any configurations I
>> should remember? Any advice on implementation to ease my transition and
> run
>> mahout 24hrs? Thanks
>> 
>> --
>> Regards,
>> Ankit Goel
>> http://about.me/ankitgoel
>> 
> 

-- 
Regards,
Ankit Goel
http://about.me/ankitgoel

Re: Mahout on the cloud

Posted by Ankit Goel <an...@gmail.com>.

Thanks for the heads up Dmitriy..thats exactly the kind of warning I was
looking for. I dont have any experience implementing MR yet --i understand
the algo perfectly-- so this is a great heads up. Any advice oor warnings
on hadoop installations and versions??

On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> MapReduce things enter de-facto end-of-life. Not that we specifically don't
> want to support them, it is de-facto nobody bothers to support them --
> especially risks are high with new versions of hadoop and EMR.
>
> That said, we'd be grateful for any guide about doing this in EMR.
>
> On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <an...@gmail.com>
> wrote:
>
> > Hi,
> > After my runs on my lappy, I'm ready to port my work to the cloud.
> Planning
> > to use Amazon. One thing I noticed when I started with mahout, that there
> > were a lot of things unsaid on the site/wiki and took me a lot of time to
> > figure out. Pitfalls if I may call them. I will primarily be using
> > clustering on the cloud, so the code to accept new data and run it is
> what
> > I have for now.
> >
> > So before I port to the cloud, are there any things I should beware of or
> > lookout for? Like is AWS fine with mahout? Are there any configurations I
> > should remember? Any advice on implementation to ease my transition and
> run
> > mahout 24hrs? Thanks
> >
> > --
> > Regards,
> > Ankit Goel
> > http://about.me/ankitgoel
> >
>



-- 
Regards,
Ankit Goel
http://about.me/ankitgoel

Re: Mahout on the cloud

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

MapReduce things enter de-facto end-of-life. Not that we specifically don't
want to support them, it is de-facto nobody bothers to support them --
especially risks are high with new versions of hadoop and EMR.

That said, we'd be grateful for any guide about doing this in EMR.

On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <an...@gmail.com> wrote:

> Hi,
> After my runs on my lappy, I'm ready to port my work to the cloud. Planning
> to use Amazon. One thing I noticed when I started with mahout, that there
> were a lot of things unsaid on the site/wiki and took me a lot of time to
> figure out. Pitfalls if I may call them. I will primarily be using
> clustering on the cloud, so the code to accept new data and run it is what
> I have for now.
>
> So before I port to the cloud, are there any things I should beware of or
> lookout for? Like is AWS fine with mahout? Are there any configurations I
> should remember? Any advice on implementation to ease my transition and run
> mahout 24hrs? Thanks
>
> --
> Regards,
> Ankit Goel
> http://about.me/ankitgoel
>