You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Ganelin, Ilya" <Il...@capitalone.com> on 2014/12/12 22:41:51 UTC

Newest ML-Lib on Spark 1.1

Hi all – we’re running CDH 5.2 and would be interested in having the latest and greatest ML Lib version on our cluster (with YARN). Could anyone help me out in terms of figuring out what build profiles to use to get this to play well? Will I be able to update ML-Lib independently of updating the rest of spark to 1.2 and beyond? I ran into numerous issues trying to build 1.2 against CDH’s Hadoop deployment. Alternately, if anyone has managed to get the trunk successfully built and tested against Cloudera’s YARN and Hadoop for 5.2 I would love some help. Thanks!
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Newest ML-Lib on Spark 1.1

Posted by Debasish Das <de...@gmail.com>.
For CDH this works well for me...tested till 5.1...

./make-distribution -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn
-Phive -DskipTests

To build with hive thriftserver support for spark-sql

On Fri, Dec 12, 2014 at 1:41 PM, Ganelin, Ilya <Il...@capitalone.com>
wrote:
>
> Hi all – we’re running CDH 5.2 and would be interested in having the
> latest and greatest ML Lib version on our cluster (with YARN). Could anyone
> help me out in terms of figuring out what build profiles to use to get this
> to play well? Will I be able to update ML-Lib independently of updating the
> rest of spark to 1.2 and beyond? I ran into numerous issues trying to build
> 1.2 against CDH’s Hadoop deployment. Alternately, if anyone has managed to
> get the trunk successfully built and tested against Cloudera’s YARN and
> Hadoop for 5.2 I would love some help. Thanks!
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed.  If the reader of this message is not the
> intended recipient, you are hereby notified that any review,
> retransmission, dissemination, distribution, copying or other use of, or
> taking of any action in reliance upon this information is strictly
> prohibited. If you have received this communication in error, please
> contact the sender and delete the material from your computer.
>

Re: Newest ML-Lib on Spark 1.1

Posted by Debasish Das <de...@gmail.com>.
protobuf comes from missing -Phadoop2.3

On Fri, Dec 12, 2014 at 2:34 PM, Sean Owen <so...@cloudera.com> wrote:
>
> What errors do you see? protobuf errors usually mean you didn't build
> for the right version of Hadoop, but if you are using -Phadoop-2.3 or
> better -Phadoop-2.4 that should be fine. Yes, a stack trace would be
> good. I'm still not sure what error you are seeing.
>
> On Fri, Dec 12, 2014 at 10:32 PM, Ganelin, Ilya
> <Il...@capitalone.com> wrote:
> > Hi Sean - I should clarify : I was able to build the master but when
> running
> > I hit really random looking protobuf errors (just starting up a spark
> > shell), I can try doing a build later today and give the exact stack
> trace.
> >
> > I know that 5.2 is running 1.1 but I believe the latest and greatest Ml
> Lib
> > is much fresher than the one in 1.1 and specifically includes fixed for
> ALS
> > to help it scale better.
> >
> > I had built with the exact flags you suggested below. After doing so I
> tried
> > to run the test suite and run a spark she'll without success. Might you
> have
> > any other suggestions? Thanks!
> >
> >
> >
> > Sent with Good (www.good.com)
> >
> >
> >
> > -----Original Message-----
> > From: Sean Owen [sowen@cloudera.com]
> > Sent: Friday, December 12, 2014 04:54 PM Eastern Standard Time
> > To: Ganelin, Ilya
> > Cc: dev
> > Subject: Re: Newest ML-Lib on Spark 1.1
> >
> > Could you specify what problems you're seeing? there is nothing
> > special about the CDH distribution at all.
> >
> > The latest and greatest is 1.1, and that is what is in CDH 5.2. You
> > can certainly compile even master for CDH and get it to work though.
> >
> > The safest build flags should be "-Phadoop-2.4
> > -Dhadoop.version=2.5.0-cdh5.2.1".
> >
> > 5.3 is just around the corner, and includes 1.2, which is also just
> > around the corner.
> >
> > On Fri, Dec 12, 2014 at 9:41 PM, Ganelin, Ilya
> > <Il...@capitalone.com> wrote:
> >> Hi all – we’re running CDH 5.2 and would be interested in having the
> >> latest and greatest ML Lib version on our cluster (with YARN). Could
> anyone
> >> help me out in terms of figuring out what build profiles to use to get
> this
> >> to play well? Will I be able to update ML-Lib independently of updating
> the
> >> rest of spark to 1.2 and beyond? I ran into numerous issues trying to
> build
> >> 1.2 against CDH’s Hadoop deployment. Alternately, if anyone has managed
> to
> >> get the trunk successfully built and tested against Cloudera’s YARN and
> >> Hadoop for 5.2 I would love some help. Thanks!
> >> ________________________________________________________
> >>
> >> The information contained in this e-mail is confidential and/or
> >> proprietary to Capital One and/or its affiliates. The information
> >> transmitted herewith is intended only for use by the individual or
> entity to
> >> which it is addressed.  If the reader of this message is not the
> intended
> >> recipient, you are hereby notified that any review, retransmission,
> >> dissemination, distribution, copying or other use of, or taking of any
> >> action in reliance upon this information is strictly prohibited. If you
> have
> >> received this communication in error, please contact the sender and
> delete
> >> the material from your computer.
> >
> >
> > ________________________________
> >
> > The information contained in this e-mail is confidential and/or
> proprietary
> > to Capital One and/or its affiliates. The information transmitted
> herewith
> > is intended only for use by the individual or entity to which it is
> > addressed.  If the reader of this message is not the intended recipient,
> you
> > are hereby notified that any review, retransmission, dissemination,
> > distribution, copying or other use of, or taking of any action in
> reliance
> > upon this information is strictly prohibited. If you have received this
> > communication in error, please contact the sender and delete the material
> > from your computer.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Newest ML-Lib on Spark 1.1

Posted by Sean Owen <so...@cloudera.com>.
What errors do you see? protobuf errors usually mean you didn't build
for the right version of Hadoop, but if you are using -Phadoop-2.3 or
better -Phadoop-2.4 that should be fine. Yes, a stack trace would be
good. I'm still not sure what error you are seeing.

On Fri, Dec 12, 2014 at 10:32 PM, Ganelin, Ilya
<Il...@capitalone.com> wrote:
> Hi Sean - I should clarify : I was able to build the master but when running
> I hit really random looking protobuf errors (just starting up a spark
> shell), I can try doing a build later today and give the exact stack trace.
>
> I know that 5.2 is running 1.1 but I believe the latest and greatest Ml Lib
> is much fresher than the one in 1.1 and specifically includes fixed for ALS
> to help it scale better.
>
> I had built with the exact flags you suggested below. After doing so I tried
> to run the test suite and run a spark she'll without success. Might you have
> any other suggestions? Thanks!
>
>
>
> Sent with Good (www.good.com)
>
>
>
> -----Original Message-----
> From: Sean Owen [sowen@cloudera.com]
> Sent: Friday, December 12, 2014 04:54 PM Eastern Standard Time
> To: Ganelin, Ilya
> Cc: dev
> Subject: Re: Newest ML-Lib on Spark 1.1
>
> Could you specify what problems you're seeing? there is nothing
> special about the CDH distribution at all.
>
> The latest and greatest is 1.1, and that is what is in CDH 5.2. You
> can certainly compile even master for CDH and get it to work though.
>
> The safest build flags should be "-Phadoop-2.4
> -Dhadoop.version=2.5.0-cdh5.2.1".
>
> 5.3 is just around the corner, and includes 1.2, which is also just
> around the corner.
>
> On Fri, Dec 12, 2014 at 9:41 PM, Ganelin, Ilya
> <Il...@capitalone.com> wrote:
>> Hi all – we’re running CDH 5.2 and would be interested in having the
>> latest and greatest ML Lib version on our cluster (with YARN). Could anyone
>> help me out in terms of figuring out what build profiles to use to get this
>> to play well? Will I be able to update ML-Lib independently of updating the
>> rest of spark to 1.2 and beyond? I ran into numerous issues trying to build
>> 1.2 against CDH’s Hadoop deployment. Alternately, if anyone has managed to
>> get the trunk successfully built and tested against Cloudera’s YARN and
>> Hadoop for 5.2 I would love some help. Thanks!
>> ________________________________________________________
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates. The information
>> transmitted herewith is intended only for use by the individual or entity to
>> which it is addressed.  If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you have
>> received this communication in error, please contact the sender and delete
>> the material from your computer.
>
>
> ________________________________
>
> The information contained in this e-mail is confidential and/or proprietary
> to Capital One and/or its affiliates. The information transmitted herewith
> is intended only for use by the individual or entity to which it is
> addressed.  If the reader of this message is not the intended recipient, you
> are hereby notified that any review, retransmission, dissemination,
> distribution, copying or other use of, or taking of any action in reliance
> upon this information is strictly prohibited. If you have received this
> communication in error, please contact the sender and delete the material
> from your computer.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


RE: Newest ML-Lib on Spark 1.1

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
Hi Sean - I should clarify : I was able to build the master but when running I hit really random looking protobuf errors (just starting up a spark shell), I can try doing a build later today and give the exact stack trace.

I know that 5.2 is running 1.1 but I believe the latest and greatest Ml Lib is much fresher than the one in 1.1 and specifically includes fixed for ALS to help it scale better.

I had built with the exact flags you suggested below. After doing so I tried to run the test suite and run a spark she'll without success. Might you have any other suggestions? Thanks!



Sent with Good (www.good.com)


-----Original Message-----
From: Sean Owen [sowen@cloudera.com<ma...@cloudera.com>]
Sent: Friday, December 12, 2014 04:54 PM Eastern Standard Time
To: Ganelin, Ilya
Cc: dev
Subject: Re: Newest ML-Lib on Spark 1.1


Could you specify what problems you're seeing? there is nothing
special about the CDH distribution at all.

The latest and greatest is 1.1, and that is what is in CDH 5.2. You
can certainly compile even master for CDH and get it to work though.

The safest build flags should be "-Phadoop-2.4 -Dhadoop.version=2.5.0-cdh5.2.1".

5.3 is just around the corner, and includes 1.2, which is also just
around the corner.

On Fri, Dec 12, 2014 at 9:41 PM, Ganelin, Ilya
<Il...@capitalone.com> wrote:
> Hi all – we’re running CDH 5.2 and would be interested in having the latest and greatest ML Lib version on our cluster (with YARN). Could anyone help me out in terms of figuring out what build profiles to use to get this to play well? Will I be able to update ML-Lib independently of updating the rest of spark to 1.2 and beyond? I ran into numerous issues trying to build 1.2 against CDH’s Hadoop deployment. Alternately, if anyone has managed to get the trunk successfully built and tested against Cloudera’s YARN and Hadoop for 5.2 I would love some help. Thanks!
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Newest ML-Lib on Spark 1.1

Posted by Sean Owen <so...@cloudera.com>.
Could you specify what problems you're seeing? there is nothing
special about the CDH distribution at all.

The latest and greatest is 1.1, and that is what is in CDH 5.2. You
can certainly compile even master for CDH and get it to work though.

The safest build flags should be "-Phadoop-2.4 -Dhadoop.version=2.5.0-cdh5.2.1".

5.3 is just around the corner, and includes 1.2, which is also just
around the corner.

On Fri, Dec 12, 2014 at 9:41 PM, Ganelin, Ilya
<Il...@capitalone.com> wrote:
> Hi all – we’re running CDH 5.2 and would be interested in having the latest and greatest ML Lib version on our cluster (with YARN). Could anyone help me out in terms of figuring out what build profiles to use to get this to play well? Will I be able to update ML-Lib independently of updating the rest of spark to 1.2 and beyond? I ran into numerous issues trying to build 1.2 against CDH’s Hadoop deployment. Alternately, if anyone has managed to get the trunk successfully built and tested against Cloudera’s YARN and Hadoop for 5.2 I would love some help. Thanks!
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org