You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by gj <ga...@gmail.com> on 2011/05/22 00:08:32 UTC

slow for RMSE

Hi,
I've new to mahout. I using mahout-0.3 with Eclipse jdk1.6.0_18 (no hadoop).
I trying to the find RMSE for a dataset. But it seems very slow .. so far I
have not been able to get the RMSE value for single run. Hence, I was
wondering if anybody can  look at my setup and tell what I am doing wrong or
why it so slow.

Here's my code:
public static void main(String[] args) {
RecommenderBuilder builder = new RecommenderBuilder() {
 public Recommender buildRecommender(DataModel model) throws TasteException{
UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(model);
 UserNeighborhood neighborhood = new NearestNUserNeighborhood(5,
userSimilarity, model);
Recommender recommender = new GenericUserBasedRecommender(model,
neighborhood, userSimilarity);
 return new CachingRecommender(recommender);
}
};

RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
try {
 DataModel model = new FileDataModel(new
File("lf_playhistory_step1_ratings.dat"));
 double score = evaluator.evaluate(builder,
null,
model,
 0.9,
1.0);
 System.out.println(score);
 } catch (Exception e) {
System.err.println("FileNotFoundException: " + e.getMessage());
 }
}
}

Dataset is: 5,462,701 entries of these tuples <userid,track,rating>
no. of tracks=610,192
no of users=2330
ratings = 1 to 5

This is output that I got on console:

21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
INFO: Creating FileDataModel for file lf_playhistory_step1_ratings.dat
21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
INFO: Beginning evaluation using 0.9 of
FileDataModel[dataFile:C:\eclipse_workspace\LastFM\lf_playhistory_step1_ratings.dat]
21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
INFO: Reading file info...
21-May-2011 22:28:19 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 1000000 lines
21-May-2011 22:29:53 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 2000000 lines
21-May-2011 22:32:09 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 3000000 lines
21-May-2011 22:34:03 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 4000000 lines
21-May-2011 22:36:19 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 5000000 lines
21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info
INFO: Read lines: 5462701
21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info
INFO: Reading file info...
21-May-2011 22:37:16 org.slf4j.impl.JCLLoggerAdapter info
INFO: Read lines: 100000
21-May-2011 22:37:21 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 2330 users
21-May-2011 22:37:28 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 2330 users
21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info
INFO: Beginning evaluation of 2323 users
21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info
INFO: Starting timing of 2323 tasks in 2 threads
21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info
INFO: Average time per recommendation: 178468ms
21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info
INFO: Approximate memory used: 585MB / 840MB

>From there on, I just waited for two hours ..and no output.
The INFO: Average time per recommendation: 178468ms seem very high ....I'm
guessing it's 178sec X 2330 users = 4.8 days!
This running on my laptop (Intel Core 2 Duo, T7500 @ 2.2GHz 2 GB RAM)

Why is this taking so long? Is it too big a dataset? Is my laptop too slow?

Can anybody help?

Thanks,
Gawesh

Re: slow for RMSE

Posted by Lance Norskog <go...@gmail.com>.
Yes. This time increases non-linearly. Also, check your memory levels
of the Java vm. You might be spending all time in GC.


On Sat, May 21, 2011 at 3:08 PM, gj <ga...@gmail.com> wrote:
> Hi,
> I've new to mahout. I using mahout-0.3 with Eclipse jdk1.6.0_18 (no hadoop).
> I trying to the find RMSE for a dataset. But it seems very slow .. so far I
> have not been able to get the RMSE value for single run. Hence, I was
> wondering if anybody can  look at my setup and tell what I am doing wrong or
> why it so slow.
>
> Here's my code:
> public static void main(String[] args) {
> RecommenderBuilder builder = new RecommenderBuilder() {
>  public Recommender buildRecommender(DataModel model) throws TasteException{
> UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(model);
>  UserNeighborhood neighborhood = new NearestNUserNeighborhood(5,
> userSimilarity, model);
> Recommender recommender = new GenericUserBasedRecommender(model,
> neighborhood, userSimilarity);
>  return new CachingRecommender(recommender);
> }
> };
>
> RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
> try {
>  DataModel model = new FileDataModel(new
> File("lf_playhistory_step1_ratings.dat"));
>  double score = evaluator.evaluate(builder,
> null,
> model,
>  0.9,
> 1.0);
>  System.out.println(score);
>  } catch (Exception e) {
> System.err.println("FileNotFoundException: " + e.getMessage());
>  }
> }
> }
>
> Dataset is: 5,462,701 entries of these tuples <userid,track,rating>
> no. of tracks=610,192
> no of users=2330
> ratings = 1 to 5
>
> This is output that I got on console:
>
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Creating FileDataModel for file lf_playhistory_step1_ratings.dat
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Beginning evaluation using 0.9 of
> FileDataModel[dataFile:C:\eclipse_workspace\LastFM\lf_playhistory_step1_ratings.dat]
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 21-May-2011 22:28:19 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 1000000 lines
> 21-May-2011 22:29:53 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2000000 lines
> 21-May-2011 22:32:09 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 3000000 lines
> 21-May-2011 22:34:03 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 4000000 lines
> 21-May-2011 22:36:19 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 5000000 lines
> 21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 5462701
> 21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 21-May-2011 22:37:16 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 100000
> 21-May-2011 22:37:21 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2330 users
> 21-May-2011 22:37:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2330 users
> 21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Beginning evaluation of 2323 users
> 21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Starting timing of 2323 tasks in 2 threads
> 21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Average time per recommendation: 178468ms
> 21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Approximate memory used: 585MB / 840MB
>
> From there on, I just waited for two hours ..and no output.
> The INFO: Average time per recommendation: 178468ms seem very high ....I'm
> guessing it's 178sec X 2330 users = 4.8 days!
> This running on my laptop (Intel Core 2 Duo, T7500 @ 2.2GHz 2 GB RAM)
>
> Why is this taking so long? Is it too big a dataset? Is my laptop too slow?
>
> Can anybody help?
>
> Thanks,
> Gawesh
>



-- 
Lance Norskog
goksron@gmail.com

Re: slow for RMSE

Posted by Sean Owen <sr...@gmail.com>.
Wrap you UserSimilarity in a CachingUserSimilarity. I think you're spending
a lot of time re-re-computing similarities.
You don't need a CachingRecommender.

You can use a subset of data for testing by turning down that "1.0"
parameter to something like 0.1.

On Sat, May 21, 2011 at 11:08 PM, gj <ga...@gmail.com> wrote:

> Hi,
> I've new to mahout. I using mahout-0.3 with Eclipse jdk1.6.0_18 (no
> hadoop).
> I trying to the find RMSE for a dataset. But it seems very slow .. so far I
> have not been able to get the RMSE value for single run. Hence, I was
> wondering if anybody can  look at my setup and tell what I am doing wrong
> or
> why it so slow.
>
> Here's my code:
> public static void main(String[] args) {
> RecommenderBuilder builder = new RecommenderBuilder() {
>  public Recommender buildRecommender(DataModel model) throws
> TasteException{
> UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(model);
>  UserNeighborhood neighborhood = new NearestNUserNeighborhood(5,
> userSimilarity, model);
> Recommender recommender = new GenericUserBasedRecommender(model,
> neighborhood, userSimilarity);
>  return new CachingRecommender(recommender);
> }
> };
>
> RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
> try {
>  DataModel model = new FileDataModel(new
> File("lf_playhistory_step1_ratings.dat"));
>  double score = evaluator.evaluate(builder,
> null,
> model,
>  0.9,
> 1.0);
>  System.out.println(score);
>  } catch (Exception e) {
> System.err.println("FileNotFoundException: " + e.getMessage());
>  }
> }
> }
>
> Dataset is: 5,462,701 entries of these tuples <userid,track,rating>
> no. of tracks=610,192
> no of users=2330
> ratings = 1 to 5
>
> This is output that I got on console:
>
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Creating FileDataModel for file lf_playhistory_step1_ratings.dat
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Beginning evaluation using 0.9 of
>
> FileDataModel[dataFile:C:\eclipse_workspace\LastFM\lf_playhistory_step1_ratings.dat]
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 21-May-2011 22:28:19 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 1000000 lines
> 21-May-2011 22:29:53 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2000000 lines
> 21-May-2011 22:32:09 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 3000000 lines
> 21-May-2011 22:34:03 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 4000000 lines
> 21-May-2011 22:36:19 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 5000000 lines
> 21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 5462701
> 21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 21-May-2011 22:37:16 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 100000
> 21-May-2011 22:37:21 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2330 users
> 21-May-2011 22:37:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2330 users
> 21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Beginning evaluation of 2323 users
> 21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Starting timing of 2323 tasks in 2 threads
> 21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Average time per recommendation: 178468ms
> 21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Approximate memory used: 585MB / 840MB
>
> From there on, I just waited for two hours ..and no output.
> The INFO: Average time per recommendation: 178468ms seem very high ....I'm
> guessing it's 178sec X 2330 users = 4.8 days!
> This running on my laptop (Intel Core 2 Duo, T7500 @ 2.2GHz 2 GB RAM)
>
> Why is this taking so long? Is it too big a dataset? Is my laptop too slow?
>
> Can anybody help?
>
> Thanks,
> Gawesh
>