You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sebastian Schelter <ss...@apache.org> on 2011/09/08 14:00:31 UTC

Re: how to understand the parameter "maxSimilaritiesPerItem"

+dev
-user

I don't understand your question. Can you give some more details?

--sebastian

On 08.09.2011 13:51, 张玉东 wrote:
> Ok, I understand this point, but in this step, the top similar items have been chosen, then is it needed to select the top "maxSimilaritiesPerItem" items in the job "mostSimilarItems" ?
> 
> -----邮件原件-----
> 发件人: Sebastian Schelter [mailto:ssc.open@googlemail.com] 
> 发送时间: 2011年9月8日 19:42
> 收件人: user@mahout.apache.org
> 主题: Re: how to understand the parameter "maxSimilaritiesPerItem"
> 
> The code snippet is invoked in a job that uses "Secondary Sort" which
> means that the "entries" will be seen in descending order by the
> reducer. That's why we only need to process the first ones.
> 
> --sebastian
> 
> On 08.09.2011 13:38, 张玉东 wrote:
>> Hello,
>> In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly used in the 7th map/reduce job “asMatrix” as
>>
>>     protected void reduce(SimilarityMatrixEntryKey key,
>>                           Iterable<DistributedRowMatrix.MatrixEntryWritable> entries,
>>                           Context ctx) throws IOException, InterruptedException {
>>       RandomAccessSparseVector temporaryVector = new RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow);
>>       int similaritiesSet = 0;
>>       for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
>>         temporaryVector.setQuick(entry.getCol(), entry.getVal());
>>         if (++similaritiesSet == maxSimilaritiesPerRow) {
>>           break;
>>         }
>>       }
>>       SequentialAccessSparseVector vector = new SequentialAccessSparseVector(temporaryVector);
>>       ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
>>     }
>>
>> I am confused that whether all the other items with similarity are written into the matrix for each item or not, if only part of them (not more than maxSimilaritiesPerItem) are written, then how to select them? Random?
>> Thanks.
>>
>> yudong
>>
>>
>