You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Han Hui Wen (JIRA)" <ji...@apache.org> on 2010/08/13 07:09:18 UTC
[jira] Updated: (MAHOUT-473) add parameter -Dmapred.reduce.tasks
when call job RowSimilarityJob in RecommenderJob
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Han Hui Wen updated MAHOUT-473:
--------------------------------
Description:
In RecommenderJob
{code:title=RecommenderJob.java|borderStyle=solid}
int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
/* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
* new DistributedRowMatrix(...).rowSimilarity(...) */
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation failed", e);
}
}
{code}
We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
was:
In RecommenderJob
{code:title=Bar.java|borderStyle=solid}
int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
/* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
* new DistributedRowMatrix(...).rowSimilarity(...) */
try {
RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
"-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
} catch (Exception e) {
throw new IllegalStateException("item-item-similarity computation failed", e);
}
}
{code}
We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-473
> URL: https://issues.apache.org/jira/browse/MAHOUT-473
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Fix For: 0.4
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
> int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should refactor this call to something like
> * new DistributedRowMatrix(...).rowSimilarity(...) */
> try {
> RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + maybePruneItemUserMatrixPath.toString(),
> "-Dmapred.output.dir=" + similarityMatrixPath.toString(), "--numberOfColumns",
> String.valueOf(numberOfUsers), "--similarityClassname", similarityClassname, "--maxSimilaritiesPerRow",
> String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", tempDirPath.toString() });
> } catch (Exception e) {
> throw new IllegalStateException("item-item-similarity computation failed", e);
> }
> }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three RowSimilarityJob sub-jobs run using 1 map and 1 reduce, so the sub jobs can not be scalable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.