You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Han Hui Wen (JIRA)" <ji...@apache.org> on 2010/08/13 07:09:18 UTC

[jira] Created: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
------------------------------------------------------------------------------------

                 Key: MAHOUT-473
                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.4
            Reporter: Han Hui Wen 
             Fix For: 0.4


In RecommenderJob

{code:title=Bar.java|borderStyle=solid}
    int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);

    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
      /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
       * new DistributedRowMatrix(...).rowSimilarity(...) */
      try {
        RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
            String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
            String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
      } catch (Exception e) {
        throw new IllegalStateException("item-item-similarity computation failed", e);
      }
    }
{code}

We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898084#action_12898084 ] 

Sebastian Schelter commented on MAHOUT-473:
-------------------------------------------

Hui,

can you supply a .patch file?

This page tells in detail how to do this: https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>             Fix For: 0.4
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898950#action_12898950 ] 

Han Hui Wen  commented on MAHOUT-473:
-------------------------------------

The Patch as folowing

In RecommenderJob

+    Job job = new Job(new Configuration(getConf()));
+    int numReduceTasks= job.getNumReduceTasks();


       try {
         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
            String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
            String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), 
+            "-Dmapred.reduce.tasks=" + numReduceTasks,
           "--numberOfColumns",String.valueOf(numberOfUsers), 
            "--similarityClassname", similarityClassname, 
            "--maxSimilaritiesPerRow",String.valueOf(maxSimilaritiesPerItemConsidered + 1), 
            "--tempDir", tempDirPath.toString() });
       } catch (Exception e) {
         throw new IllegalStateException("item-item-similarity computation failed", e);
       }


> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899800#action_12899800 ] 

Hudson commented on MAHOUT-473:
-------------------------------

Integrated in Mahout-Quality #200 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/200/])
    MAHOUT-473 add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob


> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: MAHOUT-473.patch, screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898093#action_12898093 ] 

Sebastian Schelter commented on MAHOUT-473:
-------------------------------------------

Hui,

Can you follow these steps on this issue:

# Modify your local copy of mahout to include the changes you propose
# build Mahout and run your job again with your modified version
# upload a patch containing your changes and give us some details on the performance increase you encountered
 
Have a look at this website: https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>             Fix For: 0.4
>
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Han Hui Wen  updated MAHOUT-473:
--------------------------------

    Attachment: screenshot-1.jpg

RowSimilarity can not be scalable

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>             Fix For: 0.4
>
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898939#action_12898939 ] 

Han Hui Wen  commented on MAHOUT-473:
-------------------------------------

1) First  mapred.tasktracker.map.tasks.maximum in  mapred-site.xml is different from  -Dmapred.reduce.tasks in command line or mapred.reduce.tasks in mapred-site.xml .

The maximum number of map tasks that will be run on a tasktracker at one time is
controlled by the mapred.tasktracker.map.tasks.maximum property, which defaults to
two tasks. There is a corresponding property for reduce tasks, mapred.task
tracker.reduce.tasks.maximum, which also defaults to two tasks.

mapred.reduce.tasks configures in  mapred-site.xml,the default value is 1.
Options specified with -D take priority over properties from the configuration
files. This is very useful: you can put defaults into configuration files, and then override
them with the -D option as needed. A common example of this is setting the number
of reducers for a MapReduce job via -D mapred.reduce.tasks=n. This will override the
number of reducers set on the cluster, or if set in any client-side configuration files.


Job Initialization
When the JobTracker receives a call to its submitJob() method, it puts it into an internal
queue from where the job scheduler will pick it up and initialize it. Initialization involves
creating an object to represent the job being run, which encapsulates its tasks, and
bookkeeping information to keep track of the tasks' status and progress .
To create the list of tasks to run, the job scheduler first retrieves the input splits computed
by the JobClient from the shared filesystem . It then creates one map task
for each split. The number of reduce tasks to create is determined by the
mapred.reduce.tasks property in the JobConf, which is set by the setNumReduce
Tasks() method, and the scheduler simply creates this number of reduce tasks to be
run. Tasks are given IDs at this point.

 mapred.reduce.tasks can be configured  file mapred-site.xml  ,but mapred.reduce.tasks in  file mapred-site.xml is used for all reducer JOBs.
we have all kinds of Job,they have different different data size,different priority.we can not change the mapred-site.xml  file continually.so we need -Dmapred.reduce.tasks parameter in command line to override the  configuration file.

2)  Parameter like "-Dmapred.reduce.tasks"  passed the RecommenderJob Can not pass to RowSimilarity,
Because  RowSimilarityJob started another new AbstractJob. RowSimilarityJob  run in a new process, parameters passed to RecommenderJob  can not passed to RowSimilarityJob.

If we passed -Dmapred.reduce.tasks=6 to RecommenderJob ,all sub jobs (like itemIDIndex,toUserVector,countUsers,itemUserMatrix,maybePruneItemUserMatrix,prePartialMultiply1,prePartialMultiply2,partialMultiply,aggregateAndRecommend) work correctly ,but Job RowSimilarityJob in a new context ,parameter  -Dmapred.reduce.tasks=6 lost when Call RowSimilarityJob ,
So All 3 sub-jobs of RowSimilarityJob (weights,pairwiseSimilarity,asMatrix) can only run using one reducer.




> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898950#action_12898950 ] 

Han Hui Wen  edited comment on MAHOUT-473 at 8/16/10 10:40 AM:
---------------------------------------------------------------

The Patch as folowing

In RecommenderJob

{code}
+    Job job = new Job(new Configuration(getConf()));
+    int numReduceTasks= job.getNumReduceTasks();


       try {
         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
            String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
            String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), 
+            "-Dmapred.reduce.tasks=" + numReduceTasks,
           "--numberOfColumns",String.valueOf(numberOfUsers), 
            "--similarityClassname", similarityClassname, 
            "--maxSimilaritiesPerRow",String.valueOf(maxSimilaritiesPerItemConsidered + 1), 
            "--tempDir", tempDirPath.toString() });
       } catch (Exception e) {
         throw new IllegalStateException("item-item-similarity computation failed", e);
       }
{code}


      was (Author: huiwenhan):
    The Patch as folowing

In RecommenderJob

+    Job job = new Job(new Configuration(getConf()));
+    int numReduceTasks= job.getNumReduceTasks();


       try {
         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
            String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
            String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), 
+            "-Dmapred.reduce.tasks=" + numReduceTasks,
           "--numberOfColumns",String.valueOf(numberOfUsers), 
            "--similarityClassname", similarityClassname, 
            "--maxSimilaritiesPerRow",String.valueOf(maxSimilaritiesPerItemConsidered + 1), 
            "--tempDir", tempDirPath.toString() });
       } catch (Exception e) {
         throw new IllegalStateException("item-item-similarity computation failed", e);
       }

  
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-473.
------------------------------

         Assignee: Sean Owen
    Fix Version/s:     (was: 0.4)
       Resolution: Not A Problem

It is up to the Hadoop cluster, and the caller, to set values like this.

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898193#action_12898193 ] 

Han Hui Wen  commented on MAHOUT-473:
-------------------------------------

Because RowSimilarityJob run a separated process ,the properties will lost from the main Job.

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by Sebastian Schelter <ss...@apache.org>.

I'll give it a try this weekend :)

Am 16.08.2010 17:30, schrieb Sean Owen (JIRA):
>     [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898964#action_12898964 ] 
>
> Sean Owen commented on MAHOUT-473:
> ----------------------------------
>
> I understand. The better change is to actually instantiate and run RowSimilarityJob within RecommenderJob, but before running, pass its Configuration "conf" property object to the child job. Then I think it all works. Sebastian do you mind trying this?
>
>   
>> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
>> ------------------------------------------------------------------------------------
>>
>>                 Key: MAHOUT-473
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>>             Project: Mahout
>>          Issue Type: Improvement
>>          Components: Collaborative Filtering
>>    Affects Versions: 0.4
>>            Reporter: Han Hui Wen 
>>            Assignee: Sean Owen
>>         Attachments: screenshot-1.jpg
>>
>>
>> In RecommenderJob
>> {code:title=RecommenderJob.java|borderStyle=solid}
>>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>>       try {
>>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>>       } catch (Exception e) {
>>         throw new IllegalStateException("item-item-similarity computation failed", e);
>>       }
>>     }
>> {code}
>> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
>> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
>>     
>

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898964#action_12898964 ] 

Sean Owen commented on MAHOUT-473:
----------------------------------

I understand. The better change is to actually instantiate and run RowSimilarityJob within RecommenderJob, but before running, pass its Configuration "conf" property object to the child job. Then I think it all works. Sebastian do you mind trying this?

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-473:
--------------------------------------

    Attachment: MAHOUT-473.patch

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: MAHOUT-473.patch, screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899761#action_12899761 ] 

Sebastian Schelter commented on MAHOUT-473:
-------------------------------------------

Should be working now, can you give it a try Han Hui?

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: MAHOUT-473.patch, screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898193#action_12898193 ] 

Han Hui Wen  edited comment on MAHOUT-473 at 8/13/10 8:34 AM:
--------------------------------------------------------------

Because RowSimilarityJob run a in separated process ,the properties will lost from the main Job.

The properties must be set by the RecommenderJob

      was (Author: huiwenhan):
    Because RowSimilarityJob run a separated process ,the properties will lost from the main Job.
  
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898970#action_12898970 ] 

Han Hui Wen  commented on MAHOUT-473:
-------------------------------------

yep, running RowSimilarityJob within RecommenderJob is better, otherwise other -D parameters also will lose.

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899927#action_12899927 ] 

Han Hui Wen  commented on MAHOUT-473:
-------------------------------------

It can work now ,the -D parameter can pass to RowSimilarityJob .
Thanks,Sebastian Schelter.

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: MAHOUT-473.patch, screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Han Hui Wen  updated MAHOUT-473:
--------------------------------

    Description: 
In RecommenderJob

{code:title=RecommenderJob.java|borderStyle=solid}
    int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);

    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
      /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
       * new DistributedRowMatrix(...).rowSimilarity(...) */
      try {
        RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
            String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
            String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
      } catch (Exception e) {
        throw new IllegalStateException("item-item-similarity computation failed", e);
      }
    }
{code}

We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

  was:
In RecommenderJob

{code:title=Bar.java|borderStyle=solid}
    int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);

    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
      /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
       * new DistributedRowMatrix(...).rowSimilarity(...) */
      try {
        RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
            String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
            String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
      } catch (Exception e) {
        throw new IllegalStateException("item-item-similarity computation failed", e);
      }
    }
{code}

We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.


> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>             Fix For: 0.4
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898343#action_12898343 ] 

Sean Owen commented on MAHOUT-473:
----------------------------------

I am not sure what you mean. Settings like "-Dmapred.reduce.tasks" are parameters for Hadoop, not the Mahout job. They are passed on the command line to Hadoop and processed. Hadoop passes them to Mahout, but Mahout doesn't care about this value usually. But no, they are available.

You'd have to provide a clear description and patch of what you think the problem is if you think there is still an issue here.

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898176#action_12898176 ] 

Han Hui Wen  commented on MAHOUT-473:
-------------------------------------

added patch https://issues.apache.org/jira/secure/attachment/12452011/patch_985097.txt

> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>             Fix For: 0.4
>
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.