You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Han Hui Wen (JIRA)" <ji...@apache.org> on 2010/08/13 08:43:17 UTC
[jira] Updated: (MAHOUT-475) Replace Mapper with
MultithreadedMapper to implement
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.CooccurrencesMapper
[ https://issues.apache.org/jira/browse/MAHOUT-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Han Hui Wen updated MAHOUT-475:
--------------------------------
Description:
Because CooccurrencesMapper has huge computing,
Maybe we can replace Mapper with MultithreadedMapper.
Original:
{code}
public static class CooccurrencesMapper
extends Mapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
{code}
new:
{code}
public static class CooccurrencesMapper
extends MultithreadedMapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
{code}
And call the mapper
original:
{code}
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
Job pairwiseSimilarity = prepareJob(weightsPath,
pairwiseSimilarityPath,
SequenceFileInputFormat.class,
CooccurrencesMapper.class,
WeightedRowPair.class,
Cooccurrence.class,
SimilarityReducer.class,
SimilarityMatrixEntryKey.class,
MatrixEntryWritable.class,
SequenceFileOutputFormat.class);
Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname);
pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
pairwiseSimilarity.waitForCompletion(true);
}
{code}
new:
{code}
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
Job pairwiseSimilarity = prepareJob(weightsPath,
pairwiseSimilarityPath,
SequenceFileInputFormat.class,
CooccurrencesMapper.class,
WeightedRowPair.class,
Cooccurrence.class,
SimilarityReducer.class,
SimilarityMatrixEntryKey.class,
MatrixEntryWritable.class,
SequenceFileOutputFormat.class);
Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname);
pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
CooccurrencesMapper.setNumberOfThreads(n); //n should about be less than core counts of the machine.
pairwiseSimilarity.waitForCompletion(true);
}
{code}
was:
Because CooccurrencesMapper has huge computing,
Maybe we can replace Mapper with MultithreadedMapper.
Original:
{code}
public static class CooccurrencesMapper
extends Mapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
{code}
new:
{code}
public static class CooccurrencesMapper
extends MultithreadedMapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
{code}
And call the mapper
original:
{code}
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
Job pairwiseSimilarity = prepareJob(weightsPath,
pairwiseSimilarityPath,
SequenceFileInputFormat.class,
CooccurrencesMapper.class,
WeightedRowPair.class,
Cooccurrence.class,
SimilarityReducer.class,
SimilarityMatrixEntryKey.class,
MatrixEntryWritable.class,
SequenceFileOutputFormat.class);
Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname);
pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
pairwiseSimilarity.waitForCompletion(true);
}
{code}
new:
{code}
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
Job pairwiseSimilarity = prepareJob(weightsPath,
pairwiseSimilarityPath,
SequenceFileInputFormat.class,
CooccurrencesMapper.class,
WeightedRowPair.class,
Cooccurrence.class,
SimilarityReducer.class,
SimilarityMatrixEntryKey.class,
MatrixEntryWritable.class,
SequenceFileOutputFormat.class);
Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname);
pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
CooccurrencesMapper.setNumberOfThreads(n); //n should about be less the core counts of the machine.
pairwiseSimilarity.waitForCompletion(true);
}
{code}
> Replace Mapper with MultithreadedMapper to implement org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.CooccurrencesMapper
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-475
> URL: https://issues.apache.org/jira/browse/MAHOUT-475
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Fix For: 0.4
>
>
> Because CooccurrencesMapper has huge computing,
> Maybe we can replace Mapper with MultithreadedMapper.
> Original:
> {code}
> public static class CooccurrencesMapper
> extends Mapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
> {code}
> new:
> {code}
> public static class CooccurrencesMapper
> extends MultithreadedMapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
> {code}
> And call the mapper
> original:
> {code}
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> Job pairwiseSimilarity = prepareJob(weightsPath,
> pairwiseSimilarityPath,
> SequenceFileInputFormat.class,
> CooccurrencesMapper.class,
> WeightedRowPair.class,
> Cooccurrence.class,
> SimilarityReducer.class,
> SimilarityMatrixEntryKey.class,
> MatrixEntryWritable.class,
> SequenceFileOutputFormat.class);
> Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
> pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname);
> pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
> pairwiseSimilarity.waitForCompletion(true);
> }
> {code}
> new:
> {code}
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> Job pairwiseSimilarity = prepareJob(weightsPath,
> pairwiseSimilarityPath,
> SequenceFileInputFormat.class,
> CooccurrencesMapper.class,
> WeightedRowPair.class,
> Cooccurrence.class,
> SimilarityReducer.class,
> SimilarityMatrixEntryKey.class,
> MatrixEntryWritable.class,
> SequenceFileOutputFormat.class);
> Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
> pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, distributedSimilarityClassname);
> pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
> CooccurrencesMapper.setNumberOfThreads(n); //n should about be less than core counts of the machine.
> pairwiseSimilarity.waitForCompletion(true);
>
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.