You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Roberto Pagliari <ro...@asos.com> on 2016/02/09 00:48:03 UTC

ALS rating caching

When using ALS from mllib, would it be better/recommended to cache the ratings RDD?

I'm asking because when predicting products for users (for example) it is recommended to cache product/user matrices.

Thank you,

Re: ALS rating caching

Posted by Roberto Pagliari <ro...@asos.com>.

Hi Nick,
>From which version does that apply? I'm using 1.5.2

Thank you,

From: Nick Pentreath <ni...@gmail.com>>
Date: Tuesday, 9 February 2016 07:02
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: ALS rating caching

In the "new" ALS intermediate RDDs (including the ratings input RDD after transforming to block-partitioned ratings) is cached using intermediateRDDStorageLevel, and you can select the final RDD storage level (for user and item factors) using finalRDDStorageLevel.

The old MLLIB API now calls the new ALS so the same semantics apply.

So it should not be necessary to cache the raw input RDD.

On Tue, 9 Feb 2016 at 01:48 Roberto Pagliari <ro...@asos.com>> wrote:
When using ALS from mllib, would it be better/recommended to cache the ratings RDD?

I'm asking because when predicting products for users (for example) it is recommended to cache product/user matrices.

Thank you,

Re: ALS rating caching

Posted by Nick Pentreath <ni...@gmail.com>.

In the "new" ALS intermediate RDDs (including the ratings input RDD after
transforming to block-partitioned ratings) is cached using
intermediateRDDStorageLevel, and you can select the final RDD storage level
(for user and item factors) using finalRDDStorageLevel.

The old MLLIB API now calls the new ALS so the same semantics apply.

So it should not be necessary to cache the raw input RDD.

On Tue, 9 Feb 2016 at 01:48 Roberto Pagliari <ro...@asos.com>
wrote:

> When using ALS from mllib, would it be better/recommended to cache the
> ratings RDD?
>
> I’m asking because when predicting products for users (for example) it is
> recommended to cache product/user matrices.
>
> Thank you,
>
>