You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Pallavi Palleti <pa...@gmail.com> on 2008/09/17 14:36:07 UTC

OutOfMemory Error

Hi all,

   I am getting outofmemory error as shown below when I ran map-red on huge
amount of data.: 
java.lang.OutOfMemoryError: Java heap space
	at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
	at
org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
	at
org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:3002)
	at
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2802)
	at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
	at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
The above error comes almost at the end of map job. I have set the heap size
to 1GB. Still the problem is persisting.  Can someone please help me how to
avoid this error?
-- 
View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19531174.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

RE: OutOfMemory Error

Posted by Leon Mergen <l....@solatis.com>.

Hello,

What version of Hadoop are you using ?

Regards,

Leon Mergen

> -----Original Message-----
> From: Pallavi Palleti [mailto:pallavip.05@gmail.com]
> Sent: Wednesday, September 17, 2008 2:36 PM
> To: core-user@hadoop.apache.org
> Subject: OutOfMemory Error
>
>
> Hi all,
>
>    I am getting outofmemory error as shown below when I ran map-red on
> huge
> amount of data.:
> java.lang.OutOfMemoryError: Java heap space
>         at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.jav
> a:52)
>         at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1
> 974)
>         at
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(S
> equenceFile.java:3002)
>         at
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.
> java:2802)
>         at
> org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.jav
> a:1040)
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698
> )
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>         at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
> The above error comes almost at the end of map job. I have set the heap
> size
> to 1GB. Still the problem is persisting.  Can someone please help me
> how to
> avoid this error?
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-
> tp19531174p19531174.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: OutOfMemory Error

Posted by "Edward J. Yoon" <ed...@apache.org>.

If vector size too large, Current Hama will returns out of memory,
too. So, I would like to add the 2D layout version to 0.1 release plan
for parallel matrix multiplication.

Therefore, I'll renaming some classes.

MultiplicationMap.java -> Mult1DLayoutMap.java
MultiplicationReduce.java -> Mult1DLayoutReduce.java

/Edward

On Fri, Sep 19, 2008 at 5:41 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Great experience!
>
> /Edward
>
> On Fri, Sep 19, 2008 at 2:50 PM, Palleti, Pallavi
> <pa...@corp.aol.com> wrote:
>> Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations.
>>
>> But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now.
>>
>> Thanks
>> Pallavi
>> -----Original Message-----
>> From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward J. Yoon
>> Sent: Friday, September 19, 2008 10:35 AM
>> To: core-user@hadoop.apache.org; mahout-dev@lucene.apache.org; hama-dev@incubator.apache.org
>> Subject: Re: OutOfMemory Error
>>
>>> The key is of the form "ID :DenseVector Representation in mahout with
>>
>> I guess vector size seems too large so it'll need a distributed vector
>> architecture (or 2d partitioning strategies) for large scale matrix
>> operations. The hama team investigate these problem areas. So, it will
>> be improved If hama can be used for mahout in the future.
>>
>> /Edward
>>
>> On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>>>
>>> Hadoop Version - 17.1
>>> io.sort.factor =10
>>> The key is of the form "ID :DenseVector Representation in mahout with
>>> dimensionality size = 160k"
>>> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
>>> So, typical size of the key  of the mapper output can be 160K*6 (assuming
>>> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
>>> required to store that the object is of type Text
>>>
>>> Thanks
>>> Pallavi
>>>
>>>
>>>
>>> Devaraj Das wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>>    I am getting outofmemory error as shown below when I ran map-red on
>>>>> huge
>>>>> amount of data.:
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>> at
>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>>>> at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>>>> at
>>>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>>>> File.java:3002)
>>>>> at
>>>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>>>> 02)
>>>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>>>> The above error comes almost at the end of map job. I have set the heap
>>>>> size
>>>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>>>> to
>>>>> avoid this error?
>>>> What is the typical size of your key? What is the value of io.sort.factor?
>>>> Hadoop version?
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Best regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: OutOfMemory Error

Posted by "Edward J. Yoon" <ed...@apache.org>.

Great experience!

/Edward

On Fri, Sep 19, 2008 at 2:50 PM, Palleti, Pallavi
<pa...@corp.aol.com> wrote:
> Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations.
>
> But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now.
>
> Thanks
> Pallavi
> -----Original Message-----
> From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward J. Yoon
> Sent: Friday, September 19, 2008 10:35 AM
> To: core-user@hadoop.apache.org; mahout-dev@lucene.apache.org; hama-dev@incubator.apache.org
> Subject: Re: OutOfMemory Error
>
>> The key is of the form "ID :DenseVector Representation in mahout with
>
> I guess vector size seems too large so it'll need a distributed vector
> architecture (or 2d partitioning strategies) for large scale matrix
> operations. The hama team investigate these problem areas. So, it will
> be improved If hama can be used for mahout in the future.
>
> /Edward
>
> On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>>
>> Hadoop Version - 17.1
>> io.sort.factor =10
>> The key is of the form "ID :DenseVector Representation in mahout with
>> dimensionality size = 160k"
>> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
>> So, typical size of the key  of the mapper output can be 160K*6 (assuming
>> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
>> required to store that the object is of type Text
>>
>> Thanks
>> Pallavi
>>
>>
>>
>> Devaraj Das wrote:
>>>
>>>
>>>
>>>
>>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>>    I am getting outofmemory error as shown below when I ran map-red on
>>>> huge
>>>> amount of data.:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at
>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>>> File.java:3002)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>>> 02)
>>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>>> The above error comes almost at the end of map job. I have set the heap
>>>> size
>>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>>> to
>>>> avoid this error?
>>> What is the typical size of your key? What is the value of io.sort.factor?
>>> Hadoop version?
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: OutOfMemory Error

Posted by "Edward J. Yoon" <ed...@apache.org>.

Great experience!

/Edward

On Fri, Sep 19, 2008 at 2:50 PM, Palleti, Pallavi
<pa...@corp.aol.com> wrote:
> Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations.
>
> But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now.
>
> Thanks
> Pallavi
> -----Original Message-----
> From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward J. Yoon
> Sent: Friday, September 19, 2008 10:35 AM
> To: core-user@hadoop.apache.org; mahout-dev@lucene.apache.org; hama-dev@incubator.apache.org
> Subject: Re: OutOfMemory Error
>
>> The key is of the form "ID :DenseVector Representation in mahout with
>
> I guess vector size seems too large so it'll need a distributed vector
> architecture (or 2d partitioning strategies) for large scale matrix
> operations. The hama team investigate these problem areas. So, it will
> be improved If hama can be used for mahout in the future.
>
> /Edward
>
> On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>>
>> Hadoop Version - 17.1
>> io.sort.factor =10
>> The key is of the form "ID :DenseVector Representation in mahout with
>> dimensionality size = 160k"
>> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
>> So, typical size of the key  of the mapper output can be 160K*6 (assuming
>> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
>> required to store that the object is of type Text
>>
>> Thanks
>> Pallavi
>>
>>
>>
>> Devaraj Das wrote:
>>>
>>>
>>>
>>>
>>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>>    I am getting outofmemory error as shown below when I ran map-red on
>>>> huge
>>>> amount of data.:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at
>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>>> File.java:3002)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>>> 02)
>>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>>> The above error comes almost at the end of map job. I have set the heap
>>>> size
>>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>>> to
>>>> avoid this error?
>>> What is the typical size of your key? What is the value of io.sort.factor?
>>> Hadoop version?
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: OutOfMemory Error

Posted by "Edward J. Yoon" <ed...@apache.org>.

Great experience!

/Edward

On Fri, Sep 19, 2008 at 2:50 PM, Palleti, Pallavi
<pa...@corp.aol.com> wrote:
> Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations.
>
> But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now.
>
> Thanks
> Pallavi
> -----Original Message-----
> From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward J. Yoon
> Sent: Friday, September 19, 2008 10:35 AM
> To: core-user@hadoop.apache.org; mahout-dev@lucene.apache.org; hama-dev@incubator.apache.org
> Subject: Re: OutOfMemory Error
>
>> The key is of the form "ID :DenseVector Representation in mahout with
>
> I guess vector size seems too large so it'll need a distributed vector
> architecture (or 2d partitioning strategies) for large scale matrix
> operations. The hama team investigate these problem areas. So, it will
> be improved If hama can be used for mahout in the future.
>
> /Edward
>
> On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>>
>> Hadoop Version - 17.1
>> io.sort.factor =10
>> The key is of the form "ID :DenseVector Representation in mahout with
>> dimensionality size = 160k"
>> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
>> So, typical size of the key  of the mapper output can be 160K*6 (assuming
>> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
>> required to store that the object is of type Text
>>
>> Thanks
>> Pallavi
>>
>>
>>
>> Devaraj Das wrote:
>>>
>>>
>>>
>>>
>>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>>    I am getting outofmemory error as shown below when I ran map-red on
>>>> huge
>>>> amount of data.:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at
>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>>> File.java:3002)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>>> 02)
>>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>>> The above error comes almost at the end of map job. I have set the heap
>>>> size
>>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>>> to
>>>> avoid this error?
>>> What is the typical size of your key? What is the value of io.sort.factor?
>>> Hadoop version?
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

RE: OutOfMemory Error

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.

Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations.

But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now.

Thanks
Pallavi
-----Original Message-----
From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward J. Yoon
Sent: Friday, September 19, 2008 10:35 AM
To: core-user@hadoop.apache.org; mahout-dev@lucene.apache.org; hama-dev@incubator.apache.org
Subject: Re: OutOfMemory Error

> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>
>>>
>>> Hi all,
>>>
>>>    I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

RE: OutOfMemory Error

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.

Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations.

But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now.

Thanks
Pallavi
-----Original Message-----
From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward J. Yoon
Sent: Friday, September 19, 2008 10:35 AM
To: core-user@hadoop.apache.org; mahout-dev@lucene.apache.org; hama-dev@incubator.apache.org
Subject: Re: OutOfMemory Error

> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>
>>>
>>> Hi all,
>>>
>>>    I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

RE: OutOfMemory Error

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.

Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations.

But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now.

Thanks
Pallavi
-----Original Message-----
From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward J. Yoon
Sent: Friday, September 19, 2008 10:35 AM
To: core-user@hadoop.apache.org; mahout-dev@lucene.apache.org; hama-dev@incubator.apache.org
Subject: Re: OutOfMemory Error

> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>
>>>
>>> Hi all,
>>>
>>>    I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: OutOfMemory Error

Posted by "Edward J. Yoon" <ed...@apache.org>.

> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>
>>>
>>> Hi all,
>>>
>>>    I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: OutOfMemory Error

Posted by "Edward J. Yoon" <ed...@apache.org>.

> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>
>>>
>>> Hi all,
>>>
>>>    I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: OutOfMemory Error

Posted by "Edward J. Yoon" <ed...@apache.org>.

> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <pa...@gmail.com> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
>>
>>>
>>> Hi all,
>>>
>>>    I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: OutOfMemory Error

Posted by Pallavi Palleti <pa...@gmail.com>.

Hadoop Version - 17.1
io.sort.factor =10
The key is of the form "ID :DenseVector Representation in mahout with
dimensionality size = 160k"
For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
So, typical size of the key  of the mapper output can be 160K*6 (assuming
double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
required to store that the object is of type Text

Thanks
Pallavi



Devaraj Das wrote:
> 
> 
> 
> 
> On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:
> 
>> 
>> Hi all,
>> 
>>    I am getting outofmemory error as shown below when I ran map-red on
>> huge
>> amount of data.: 
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>> at
>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>> at
>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>> File.java:3002)
>> at
>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>> 02)
>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>> The above error comes almost at the end of map job. I have set the heap
>> size
>> to 1GB. Still the problem is persisting.  Can someone please help me how
>> to
>> avoid this error?
> What is the typical size of your key? What is the value of io.sort.factor?
> Hadoop version?
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: OutOfMemory Error

Posted by Devaraj Das <dd...@yahoo-inc.com>.



On 9/17/08 6:06 PM, "Pallavi Palleti" <pa...@gmail.com> wrote:

> 
> Hi all,
> 
>    I am getting outofmemory error as shown below when I ran map-red on huge
> amount of data.: 
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
> at
> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
> File.java:3002)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
> 02)
> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
> The above error comes almost at the end of map job. I have set the heap size
> to 1GB. Still the problem is persisting.  Can someone please help me how to
> avoid this error?
What is the typical size of your key? What is the value of io.sort.factor?
Hadoop version?