You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Han Hui Wen (JIRA)" <ji...@apache.org> on 2010/08/13 07:09:18 UTC
[jira] Created: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
------------------------------------------------------------------------------------
Key: MAHOUT-473
URL: https://issues.apache.org/jira/browse/MAHOUT-473
Project: Mahout
Issue Type: Improvement
Components: Collaborative Filtering
Affects Versions: 0.4
Reporter: Han Hui Wen
Fix For: 0.4
In RecommenderJob
{code:title=Bar.java|borderStyle=solid}
int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
/* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
* new DistributedRowMatrix(...).rowSimilarity(...) */
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation failed", e);
}
}
{code}
We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898084#action_12898084 ]
Sebastian Schelter commented on MAHOUT-473:
-------------------------------------------
Hui,
can you supply a .patch file?
This page tells in detail how to do this: https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Fix For: 0.4
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898950#action_12898950 ]
Han Hui Wen commented on MAHOUT-473:
-------------------------------------
The Patch as folowing
In RecommenderJob
+ Job job = new Job(new Configuration(getConf()));
+ int numReduceTasks= job.getNumReduceTasks();
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
"-Dmapred.output.dir=" + similarityMatrixPath.toString(),
+ "-Dmapred.reduce.tasks=" + numReduceTasks,
"--numberOfColumns",String.valueOf(numberOfUsers),
"--similarityClassname", similarityClassname,
"--maxSimilaritiesPerRow",String.valueOf(maxSimilaritiesPerItemConsidered + 1),
"--tempDir", tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation failed", e);
}
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899800#action_12899800 ]
Hudson commented on MAHOUT-473:
-------------------------------
Integrated in Mahout-Quality #200 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/200/])
MAHOUT-473 add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: MAHOUT-473.patch, screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898093#action_12898093 ]
Sebastian Schelter commented on MAHOUT-473:
-------------------------------------------
Hui,
Can you follow these steps on this issue:
# Modify your local copy of mahout to include the changes you propose
# build Mahout and run your job again with your modified version
# upload a patch containing your changes and give us some details on the performance increase you encountered
Have a look at this website: https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Fix For: 0.4
>
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Han Hui Wen updated MAHOUT-473:
--------------------------------
Attachment: screenshot-1.jpg
RowSimilarity can not be scalable
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Fix For: 0.4
>
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898939#action_12898939 ]
Han Hui Wen commented on MAHOUT-473:
-------------------------------------
1) First mapred.tasktracker.map.tasks.maximum in mapred-site.xml is different from -Dmapred.reduce.tasks in command line or mapred.reduce.tasks in mapred-site.xml .
The maximum number of map tasks that will be run on a tasktracker at one time is
controlled by the mapred.tasktracker.map.tasks.maximum property, which defaults to
two tasks. There is a corresponding property for reduce tasks, mapred.task
tracker.reduce.tasks.maximum, which also defaults to two tasks.
mapred.reduce.tasks configures in mapred-site.xml,the default value is 1.
Options specified with -D take priority over properties from the configuration
files. This is very useful: you can put defaults into configuration files, and then override
them with the -D option as needed. A common example of this is setting the number
of reducers for a MapReduce job via -D mapred.reduce.tasks=n. This will override the
number of reducers set on the cluster, or if set in any client-side configuration files.
Job Initialization
When the JobTracker receives a call to its submitJob() method, it puts it into an internal
queue from where the job scheduler will pick it up and initialize it. Initialization involves
creating an object to represent the job being run, which encapsulates its tasks, and
bookkeeping information to keep track of the tasks' status and progress .
To create the list of tasks to run, the job scheduler first retrieves the input splits computed
by the JobClient from the shared filesystem . It then creates one map task
for each split. The number of reduce tasks to create is determined by the
mapred.reduce.tasks property in the JobConf, which is set by the setNumReduce
Tasks() method, and the scheduler simply creates this number of reduce tasks to be
run. Tasks are given IDs at this point.
mapred.reduce.tasks can be configured file mapred-site.xml ,but mapred.reduce.tasks in file mapred-site.xml is used for all reducer JOBs.
we have all kinds of Job,they have different different data size,different priority.we can not change the mapred-site.xml file continually.so we need -Dmapred.reduce.tasks parameter in command line to override the configuration file.
2) Parameter like "-Dmapred.reduce.tasks" passed the RecommenderJob Can not pass to RowSimilarity,
Because RowSimilarityJob started another new AbstractJob. RowSimilarityJob run in a new process, parameters passed to RecommenderJob can not passed to RowSimilarityJob.
If we passed -Dmapred.reduce.tasks=6 to RecommenderJob ,all sub jobs (like itemIDIndex,toUserVector,countUsers,itemUserMatrix,maybePruneItemUserMatrix,prePartialMultiply1,prePartialMultiply2,partialMultiply,aggregateAndRecommend) work correctly ,but Job RowSimilarityJob in a new context ,parameter -Dmapred.reduce.tasks=6 lost when Call RowSimilarityJob ,
So All 3 sub-jobs of RowSimilarityJob (weights,pairwiseSimilarity,asMatrix) can only run using one reducer.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (MAHOUT-473) add parameter
-Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898950#action_12898950 ]
Han Hui Wen edited comment on MAHOUT-473 at 8/16/10 10:40 AM:
---------------------------------------------------------------
The Patch as folowing
In RecommenderJob
{code}
+ Job job = new Job(new Configuration(getConf()));
+ int numReduceTasks= job.getNumReduceTasks();
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
"-Dmapred.output.dir=" + similarityMatrixPath.toString(),
+ "-Dmapred.reduce.tasks=" + numReduceTasks,
"--numberOfColumns",String.valueOf(numberOfUsers),
"--similarityClassname", similarityClassname,
"--maxSimilaritiesPerRow",String.valueOf(maxSimilaritiesPerItemConsidered + 1),
"--tempDir", tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation failed", e);
}
{code}
was (Author: huiwenhan):
The Patch as folowing
In RecommenderJob
+ Job job = new Job(new Configuration(getConf()));
+ int numReduceTasks= job.getNumReduceTasks();
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
"-Dmapred.output.dir=" + similarityMatrixPath.toString(),
+ "-Dmapred.reduce.tasks=" + numReduceTasks,
"--numberOfColumns",String.valueOf(numberOfUsers),
"--similarityClassname", similarityClassname,
"--maxSimilaritiesPerRow",String.valueOf(maxSimilaritiesPerItemConsidered + 1),
"--tempDir", tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation failed", e);
}
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-473.
------------------------------
Assignee: Sean Owen
Fix Version/s: (was: 0.4)
Resolution: Not A Problem
It is up to the Hadoop cluster, and the caller, to set values like this.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898193#action_12898193 ]
Han Hui Wen commented on MAHOUT-473:
-------------------------------------
Because RowSimilarityJob run a separated process ,the properties will lost from the main Job.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by Sebastian Schelter <ss...@apache.org>.
I'll give it a try this weekend :)
Am 16.08.2010 17:30, schrieb Sean Owen (JIRA):
> [ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898964#action_12898964 ]
>
> Sean Owen commented on MAHOUT-473:
> ----------------------------------
>
> I understand. The better change is to actually instantiate and run RowSimilarityJob within RecommenderJob, but before running, pass its Configuration "conf" property object to the child job. Then I think it all works. Sebastian do you mind trying this?
>
>
>> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
>> ------------------------------------------------------------------------------------
>>
>> Key: MAHOUT-473
>> URL: https://issues.apache.org/jira/browse/MAHOUT-473
>> Project: Mahout
>> Issue Type: Improvement
>> Components: Collaborative Filtering
>> Affects Versions: 0.4
>> Reporter: Han Hui Wen
>> Assignee: Sean Owen
>> Attachments: screenshot-1.jpg
>>
>>
>> In RecommenderJob
>> {code:title=RecommenderJob.java|borderStyle=solid}
>> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
>> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
>> * new DistributedRowMatrix(...).rowSimilarity(...) */
>> try {
>> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
>> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
>> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
>> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
>> } catch (Exception e) {
>> throw new IllegalStateException("item-item-similarity computation failed", e);
>> }
>> }
>> {code}
>> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
>> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
>>
>
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898964#action_12898964 ]
Sean Owen commented on MAHOUT-473:
----------------------------------
I understand. The better change is to actually instantiate and run RowSimilarityJob within RecommenderJob, but before running, pass its Configuration "conf" property object to the child job. Then I think it all works. Sebastian do you mind trying this?
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Schelter updated MAHOUT-473:
--------------------------------------
Attachment: MAHOUT-473.patch
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: MAHOUT-473.patch, screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899761#action_12899761 ]
Sebastian Schelter commented on MAHOUT-473:
-------------------------------------------
Should be working now, can you give it a try Han Hui?
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: MAHOUT-473.patch, screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (MAHOUT-473) add parameter
-Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898193#action_12898193 ]
Han Hui Wen edited comment on MAHOUT-473 at 8/13/10 8:34 AM:
--------------------------------------------------------------
Because RowSimilarityJob run a in separated process ,the properties will lost from the main Job.
The properties must be set by the RecommenderJob
was (Author: huiwenhan):
Because RowSimilarityJob run a separated process ,the properties will lost from the main Job.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898970#action_12898970 ]
Han Hui Wen commented on MAHOUT-473:
-------------------------------------
yep, running RowSimilarityJob within RecommenderJob is better, otherwise other -D parameters also will lose.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899927#action_12899927 ]
Han Hui Wen commented on MAHOUT-473:
-------------------------------------
It can work now ,the -D parameter can pass to RowSimilarityJob .
Thanks,Sebastian Schelter.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: MAHOUT-473.patch, screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Han Hui Wen updated MAHOUT-473:
--------------------------------
Description:
In RecommenderJob
{code:title=RecommenderJob.java|borderStyle=solid}
int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
/* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
* new DistributedRowMatrix(...).rowSimilarity(...) */
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation failed", e);
}
}
{code}
We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
was:
In RecommenderJob
{code:title=Bar.java|borderStyle=solid}
int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
/* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
* new DistributedRowMatrix(...).rowSimilarity(...) */
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation failed", e);
}
}
{code}
We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Fix For: 0.4
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898343#action_12898343 ]
Sean Owen commented on MAHOUT-473:
----------------------------------
I am not sure what you mean. Settings like "-Dmapred.reduce.tasks" are parameters for Hadoop, not the Mahout job. They are passed on the command line to Hadoop and processed. Hadoop passes them to Mahout, but Mahout doesn't care about this value usually. But no, they are available.
You'd have to provide a clear description and patch of what you think the problem is if you think there is still an issue here.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
Posted by "Han Hui Wen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898176#action_12898176 ]
Han Hui Wen commented on MAHOUT-473:
-------------------------------------
added patch https://issues.apache.org/jira/secure/attachment/12452011/patch_985097.txt
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Fix For: 0.4
>
> Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.