You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Mohit Singh <mo...@gmail.com> on 2014/05/23 20:59:55 UTC

Setting mahout heapsize for rowsimilarity job

Hi,
   I have a 1M X 6 dimensional matrix stored as sequence file and I am
trying to use rowSimilarity for this job...
But when I try to run the job, I see Java heap space error for the second
step (RowSimilarityJob-CooccurrencesMapper-Reducer) .
My raw sequence file is around 700MB and then I have already set
MAHOUT_OPTS to (say) 7gb?
But I am still seeing that error?
My command line args are:

hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob -i
$INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess

Also, is this "r" a typo.. the help file says that this is column length?
Is it column or row dimension ?

Thanks

-- 
Mohit

"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates

Re: Setting mahout heapsize for rowsimilarity job

Posted by Mohit Singh <mo...@gmail.com>.

Basically, finding the N most similar vectors? Adding columns isnt a
problem, This is just to get a "feel" of mahout (in general).



On Fri, May 23, 2014 at 12:07 PM, Sebastian Schelter <ss...@apache.org> wrote:

> I don't think you should use RowSimilarity job for that case, if you only
> have 6 columns.
>
> Can you tell us a little bit about the data and what problem your are
> trying to solve?
>
> --sebastian
>
>
>
> On 05/23/2014 09:03 PM, Suneel Marthi wrote:
>
>> I had seen this issue too with RSJ until 0.8. Switch to using Mahout 0.9,
>> downsampling was introduced in RSJ which should avoid this error.
>>
>>
>> On Fri, May 23, 2014 at 2:59 PM, Mohit Singh <mo...@gmail.com> wrote:
>>
>>  Hi,
>>>     I have a 1M X 6 dimensional matrix stored as sequence file and I am
>>> trying to use rowSimilarity for this job...
>>> But when I try to run the job, I see Java heap space error for the second
>>> step (RowSimilarityJob-CooccurrencesMapper-Reducer) .
>>> My raw sequence file is around 700MB and then I have already set
>>> MAHOUT_OPTS to (say) 7gb?
>>> But I am still seeing that error?
>>> My command line args are:
>>>
>>> hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
>>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob
>>> -i
>>> $INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess
>>>
>>> Also, is this "r" a typo.. the help file says that this is column length?
>>> Is it column or row dimension ?
>>>
>>> Thanks
>>>
>>> --
>>> Mohit
>>>
>>> "When you want success as badly as you want the air, then you will get
>>> it.
>>> There is no other secret of success."
>>> -Socrates
>>>
>>>
>>
>


-- 
Mohit

"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates

Re: Setting mahout heapsize for rowsimilarity job

Posted by Sebastian Schelter <ss...@apache.org>.

I don't think you should use RowSimilarity job for that case, if you 
only have 6 columns.

Can you tell us a little bit about the data and what problem your are 
trying to solve?

--sebastian


On 05/23/2014 09:03 PM, Suneel Marthi wrote:
> I had seen this issue too with RSJ until 0.8. Switch to using Mahout 0.9,
> downsampling was introduced in RSJ which should avoid this error.
>
>
> On Fri, May 23, 2014 at 2:59 PM, Mohit Singh <mo...@gmail.com> wrote:
>
>> Hi,
>>     I have a 1M X 6 dimensional matrix stored as sequence file and I am
>> trying to use rowSimilarity for this job...
>> But when I try to run the job, I see Java heap space error for the second
>> step (RowSimilarityJob-CooccurrencesMapper-Reducer) .
>> My raw sequence file is around 700MB and then I have already set
>> MAHOUT_OPTS to (say) 7gb?
>> But I am still seeing that error?
>> My command line args are:
>>
>> hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob -i
>> $INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess
>>
>> Also, is this "r" a typo.. the help file says that this is column length?
>> Is it column or row dimension ?
>>
>> Thanks
>>
>> --
>> Mohit
>>
>> "When you want success as badly as you want the air, then you will get it.
>> There is no other secret of success."
>> -Socrates
>>
>

Re: Setting mahout heapsize for rowsimilarity job

Posted by Suneel Marthi <sm...@apache.org>.

I had seen this issue too with RSJ until 0.8. Switch to using Mahout 0.9,
downsampling was introduced in RSJ which should avoid this error.


On Fri, May 23, 2014 at 2:59 PM, Mohit Singh <mo...@gmail.com> wrote:

> Hi,
>    I have a 1M X 6 dimensional matrix stored as sequence file and I am
> trying to use rowSimilarity for this job...
> But when I try to run the job, I see Java heap space error for the second
> step (RowSimilarityJob-CooccurrencesMapper-Reducer) .
> My raw sequence file is around 700MB and then I have already set
> MAHOUT_OPTS to (say) 7gb?
> But I am still seeing that error?
> My command line args are:
>
> hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob -i
> $INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess
>
> Also, is this "r" a typo.. the help file says that this is column length?
> Is it column or row dimension ?
>
> Thanks
>
> --
> Mohit
>
> "When you want success as badly as you want the air, then you will get it.
> There is no other secret of success."
> -Socrates
>