You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ryan <ry...@gmail.com> on 2017/06/08 03:17:35 UTC

Re: Question about mllib.recommendation.ALS

1. could you give job, stage & task status from Spark UI? I found it
extremely useful for performance tuning.

2. use modele.transform for predictions. Usually we have a pipeline for
preparing training data, and use the same pipeline to transform data you
want to predict could give us the prediction column.

On Thu, Jun 1, 2017 at 7:48 AM, Sahib Aulakh [Search] ­ <
sahibaulakh@coupang.com> wrote:

> Hello:
>
> I am training the ALS model for recommendations. I have about 200m ratings
> from about 10m users and 3m products. I have a small cluster with 48 cores
> and 120gb cluster-wide memory.
>
> My code is very similar to the example code
>
> spark/examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala
> code.
>
> I have a couple of questions:
>
>
>    1. All steps up to model training runs reasonably fast. Model training
>    is under 10 minutes for rank 20. However, the model.recommendProductsForUsers
>    step is either slow or just does not work as the code just seems to hang at
>    this point. I have tried user and product blocks sizes of -1 and 20, 40,
>    etc, played with executor memory size, etc. Can someone shed some light
>    here as to what could be wrong?
>    2. Also, is there any example code for the ml.recommendation.ALS
>    algorithm? I can figure out how to train the model but I don't understand
>    (from the documentation) how to perform predictions?
>
> Thanks for any information you can provide.
> Sahib Aulakh.
>
>
> --
> Sahib Aulakh
> Sr. Principal Engineer
>

Re: Question about mllib.recommendation.ALS

Posted by Sahib, , Search, , ­ <sa...@coupang.com>.
Many thanks. Will try it.
On Thu, Jun 8, 2017 at 8:41 AM Nick Pentreath <ni...@gmail.com>
wrote:

> Spark 2.2 will support the recommend-all methods in ML.
>
> Also, both ML and MLLIB performance has been greatly improved for the
> recommend-all methods.
>
> Perhaps you could check out the current RC of Spark 2.2 or master branch
> to try it out?
>
> N
>
> On Thu, 8 Jun 2017 at 17:18, Sahib Aulakh [Search] ­ <
> sahibaulakh@coupang.com> wrote:
>
>> Many thanks for your response. I already figured out the details with
>> some help from another forum.
>>
>>
>>    1. I was trying to predict ratings for all users and all products.
>>    This is inefficient and now I am trying to reduce the number of required
>>    predictions.
>>    2. There is a nice example buried in Spark source code which points
>>    out the usage of ML side ALS.
>>
>> Regards.
>> Sahib Aulakh.
>>
>> On Wed, Jun 7, 2017 at 8:17 PM, Ryan <ry...@gmail.com> wrote:
>>
>>> 1. could you give job, stage & task status from Spark UI? I found it
>>> extremely useful for performance tuning.
>>>
>>> 2. use modele.transform for predictions. Usually we have a pipeline for
>>> preparing training data, and use the same pipeline to transform data you
>>> want to predict could give us the prediction column.
>>>
>>> On Thu, Jun 1, 2017 at 7:48 AM, Sahib Aulakh [Search] ­ <
>>> sahibaulakh@coupang.com> wrote:
>>>
>>>> Hello:
>>>>
>>>> I am training the ALS model for recommendations. I have about 200m
>>>> ratings from about 10m users and 3m products. I have a small cluster with
>>>> 48 cores and 120gb cluster-wide memory.
>>>>
>>>> My code is very similar to the example code
>>>>
>>>> spark/examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala
>>>> code.
>>>>
>>>> I have a couple of questions:
>>>>
>>>>
>>>>    1. All steps up to model training runs reasonably fast. Model
>>>>    training is under 10 minutes for rank 20. However, the
>>>>    model.recommendProductsForUsers step is either slow or just does not work
>>>>    as the code just seems to hang at this point. I have tried user and product
>>>>    blocks sizes of -1 and 20, 40, etc, played with executor memory size, etc.
>>>>    Can someone shed some light here as to what could be wrong?
>>>>    2. Also, is there any example code for the ml.recommendation.ALS
>>>>    algorithm? I can figure out how to train the model but I don't understand
>>>>    (from the documentation) how to perform predictions?
>>>>
>>>> Thanks for any information you can provide.
>>>> Sahib Aulakh.
>>>>
>>>>
>>>> --
>>>> Sahib Aulakh
>>>> Sr. Principal Engineer
>>>>
>>>
>>>
>>
>>
>> --
>> Sahib Aulakh
>> Sr. Principal Engineer
>>
> --
Sahib Aulakh
Sr. Principal Engineer

Re: Question about mllib.recommendation.ALS

Posted by Nick Pentreath <ni...@gmail.com>.
Spark 2.2 will support the recommend-all methods in ML.

Also, both ML and MLLIB performance has been greatly improved for the
recommend-all methods.

Perhaps you could check out the current RC of Spark 2.2 or master branch to
try it out?

N

On Thu, 8 Jun 2017 at 17:18, Sahib Aulakh [Search] ­ <
sahibaulakh@coupang.com> wrote:

> Many thanks for your response. I already figured out the details with some
> help from another forum.
>
>
>    1. I was trying to predict ratings for all users and all products.
>    This is inefficient and now I am trying to reduce the number of required
>    predictions.
>    2. There is a nice example buried in Spark source code which points
>    out the usage of ML side ALS.
>
> Regards.
> Sahib Aulakh.
>
> On Wed, Jun 7, 2017 at 8:17 PM, Ryan <ry...@gmail.com> wrote:
>
>> 1. could you give job, stage & task status from Spark UI? I found it
>> extremely useful for performance tuning.
>>
>> 2. use modele.transform for predictions. Usually we have a pipeline for
>> preparing training data, and use the same pipeline to transform data you
>> want to predict could give us the prediction column.
>>
>> On Thu, Jun 1, 2017 at 7:48 AM, Sahib Aulakh [Search] ­ <
>> sahibaulakh@coupang.com> wrote:
>>
>>> Hello:
>>>
>>> I am training the ALS model for recommendations. I have about 200m
>>> ratings from about 10m users and 3m products. I have a small cluster with
>>> 48 cores and 120gb cluster-wide memory.
>>>
>>> My code is very similar to the example code
>>>
>>> spark/examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala
>>> code.
>>>
>>> I have a couple of questions:
>>>
>>>
>>>    1. All steps up to model training runs reasonably fast. Model
>>>    training is under 10 minutes for rank 20. However, the
>>>    model.recommendProductsForUsers step is either slow or just does not work
>>>    as the code just seems to hang at this point. I have tried user and product
>>>    blocks sizes of -1 and 20, 40, etc, played with executor memory size, etc.
>>>    Can someone shed some light here as to what could be wrong?
>>>    2. Also, is there any example code for the ml.recommendation.ALS
>>>    algorithm? I can figure out how to train the model but I don't understand
>>>    (from the documentation) how to perform predictions?
>>>
>>> Thanks for any information you can provide.
>>> Sahib Aulakh.
>>>
>>>
>>> --
>>> Sahib Aulakh
>>> Sr. Principal Engineer
>>>
>>
>>
>
>
> --
> Sahib Aulakh
> Sr. Principal Engineer
>

Re: Question about mllib.recommendation.ALS

Posted by Sahib, , Search, , ­ <sa...@coupang.com>.
Many thanks for your response. I already figured out the details with some
help from another forum.


   1. I was trying to predict ratings for all users and all products. This
   is inefficient and now I am trying to reduce the number of required
   predictions.
   2. There is a nice example buried in Spark source code which points out
   the usage of ML side ALS.

Regards.
Sahib Aulakh.

On Wed, Jun 7, 2017 at 8:17 PM, Ryan <ry...@gmail.com> wrote:

> 1. could you give job, stage & task status from Spark UI? I found it
> extremely useful for performance tuning.
>
> 2. use modele.transform for predictions. Usually we have a pipeline for
> preparing training data, and use the same pipeline to transform data you
> want to predict could give us the prediction column.
>
> On Thu, Jun 1, 2017 at 7:48 AM, Sahib Aulakh [Search] ­ <
> sahibaulakh@coupang.com> wrote:
>
>> Hello:
>>
>> I am training the ALS model for recommendations. I have about 200m
>> ratings from about 10m users and 3m products. I have a small cluster with
>> 48 cores and 120gb cluster-wide memory.
>>
>> My code is very similar to the example code
>>
>> spark/examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala
>> code.
>>
>> I have a couple of questions:
>>
>>
>>    1. All steps up to model training runs reasonably fast. Model
>>    training is under 10 minutes for rank 20. However, the
>>    model.recommendProductsForUsers step is either slow or just does not
>>    work as the code just seems to hang at this point. I have tried user and
>>    product blocks sizes of -1 and 20, 40, etc, played with executor memory
>>    size, etc. Can someone shed some light here as to what could be wrong?
>>    2. Also, is there any example code for the ml.recommendation.ALS
>>    algorithm? I can figure out how to train the model but I don't understand
>>    (from the documentation) how to perform predictions?
>>
>> Thanks for any information you can provide.
>> Sahib Aulakh.
>>
>>
>> --
>> Sahib Aulakh
>> Sr. Principal Engineer
>>
>
>


-- 
Sahib Aulakh
Sr. Principal Engineer