You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Yuriy <yu...@gmail.com> on 2014/08/22 22:41:24 UTC

How to serialize very large object in Hadoop Writable?

Hadoop Writable interface relies on "public void write(DataOutput out)" method.
It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
which uses a simple array under the cover.

When I try to write a lot of data in DataOutput in my reducer, I get:

Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
limit at java.util.Arrays.copyOf(Arrays.java:3230) at
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
java.io.DataOutputStream.write(DataOutputStream.java:107) at
java.io.FilterOutputStream.write(FilterOutputStream.java:97)

Looks like the system is unable to allocate the continuous array of the
requested size. Apparently, increasing the heap size available to the
reducer does not help - it is already at 84GB (-Xmx84G)

If I cannot reduce the size of the object that I need to serialize (as the
reducer constructs this object by combining the object data), what should I
try to work around this problem?

Thanks,

Yuriy

Re: How to serialize very large object in Hadoop Writable?

Posted by Alexander Pivovarov <ap...@gmail.com>.
Usually Hadoop Map Reduce deals with row based data.
ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

if you need to write a lot to hdfs file you can get OutputStream to hdfs
file and write bytes.


On Fri, Aug 22, 2014 at 3:30 PM, Yuriy <yu...@gmail.com> wrote:

> Thank you, Alexander. That, at least, explains the problem. And what
> should be the workaround if the combined set of data is larger than 2 GB?
>
>
> On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> Max array size is max integer. So, byte array can not be bigger than 2GB
>> On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:
>>
>>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>>> which uses a simple array under the cover.
>>>
>>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>>
>>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>>
>>> Looks like the system is unable to allocate the continuous array of the
>>> requested size. Apparently, increasing the heap size available to the
>>> reducer does not help - it is already at 84GB (-Xmx84G)
>>>
>>> If I cannot reduce the size of the object that I need to serialize (as
>>> the reducer constructs this object by combining the object data), what
>>> should I try to work around this problem?
>>>
>>> Thanks,
>>>
>>> Yuriy
>>>
>>
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Alexander Pivovarov <ap...@gmail.com>.
Usually Hadoop Map Reduce deals with row based data.
ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

if you need to write a lot to hdfs file you can get OutputStream to hdfs
file and write bytes.


On Fri, Aug 22, 2014 at 3:30 PM, Yuriy <yu...@gmail.com> wrote:

> Thank you, Alexander. That, at least, explains the problem. And what
> should be the workaround if the combined set of data is larger than 2 GB?
>
>
> On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> Max array size is max integer. So, byte array can not be bigger than 2GB
>> On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:
>>
>>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>>> which uses a simple array under the cover.
>>>
>>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>>
>>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>>
>>> Looks like the system is unable to allocate the continuous array of the
>>> requested size. Apparently, increasing the heap size available to the
>>> reducer does not help - it is already at 84GB (-Xmx84G)
>>>
>>> If I cannot reduce the size of the object that I need to serialize (as
>>> the reducer constructs this object by combining the object data), what
>>> should I try to work around this problem?
>>>
>>> Thanks,
>>>
>>> Yuriy
>>>
>>
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Alexander Pivovarov <ap...@gmail.com>.
Usually Hadoop Map Reduce deals with row based data.
ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

if you need to write a lot to hdfs file you can get OutputStream to hdfs
file and write bytes.


On Fri, Aug 22, 2014 at 3:30 PM, Yuriy <yu...@gmail.com> wrote:

> Thank you, Alexander. That, at least, explains the problem. And what
> should be the workaround if the combined set of data is larger than 2 GB?
>
>
> On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> Max array size is max integer. So, byte array can not be bigger than 2GB
>> On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:
>>
>>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>>> which uses a simple array under the cover.
>>>
>>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>>
>>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>>
>>> Looks like the system is unable to allocate the continuous array of the
>>> requested size. Apparently, increasing the heap size available to the
>>> reducer does not help - it is already at 84GB (-Xmx84G)
>>>
>>> If I cannot reduce the size of the object that I need to serialize (as
>>> the reducer constructs this object by combining the object data), what
>>> should I try to work around this problem?
>>>
>>> Thanks,
>>>
>>> Yuriy
>>>
>>
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Alexander Pivovarov <ap...@gmail.com>.
Usually Hadoop Map Reduce deals with row based data.
ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

if you need to write a lot to hdfs file you can get OutputStream to hdfs
file and write bytes.


On Fri, Aug 22, 2014 at 3:30 PM, Yuriy <yu...@gmail.com> wrote:

> Thank you, Alexander. That, at least, explains the problem. And what
> should be the workaround if the combined set of data is larger than 2 GB?
>
>
> On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> Max array size is max integer. So, byte array can not be bigger than 2GB
>> On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:
>>
>>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>>> which uses a simple array under the cover.
>>>
>>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>>
>>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>>
>>> Looks like the system is unable to allocate the continuous array of the
>>> requested size. Apparently, increasing the heap size available to the
>>> reducer does not help - it is already at 84GB (-Xmx84G)
>>>
>>> If I cannot reduce the size of the object that I need to serialize (as
>>> the reducer constructs this object by combining the object data), what
>>> should I try to work around this problem?
>>>
>>> Thanks,
>>>
>>> Yuriy
>>>
>>
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Yuriy <yu...@gmail.com>.
Thank you, Alexander. That, at least, explains the problem. And what should
be the workaround if the combined set of data is larger than 2 GB?


On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> Max array size is max integer. So, byte array can not be bigger than 2GB
> On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:
>
>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>> which uses a simple array under the cover.
>>
>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>
>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>
>> Looks like the system is unable to allocate the continuous array of the
>> requested size. Apparently, increasing the heap size available to the
>> reducer does not help - it is already at 84GB (-Xmx84G)
>>
>> If I cannot reduce the size of the object that I need to serialize (as
>> the reducer constructs this object by combining the object data), what
>> should I try to work around this problem?
>>
>> Thanks,
>>
>> Yuriy
>>
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Yuriy <yu...@gmail.com>.
Thank you, Alexander. That, at least, explains the problem. And what should
be the workaround if the combined set of data is larger than 2 GB?


On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> Max array size is max integer. So, byte array can not be bigger than 2GB
> On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:
>
>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>> which uses a simple array under the cover.
>>
>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>
>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>
>> Looks like the system is unable to allocate the continuous array of the
>> requested size. Apparently, increasing the heap size available to the
>> reducer does not help - it is already at 84GB (-Xmx84G)
>>
>> If I cannot reduce the size of the object that I need to serialize (as
>> the reducer constructs this object by combining the object data), what
>> should I try to work around this problem?
>>
>> Thanks,
>>
>> Yuriy
>>
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Yuriy <yu...@gmail.com>.
Thank you, Alexander. That, at least, explains the problem. And what should
be the workaround if the combined set of data is larger than 2 GB?


On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> Max array size is max integer. So, byte array can not be bigger than 2GB
> On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:
>
>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>> which uses a simple array under the cover.
>>
>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>
>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>
>> Looks like the system is unable to allocate the continuous array of the
>> requested size. Apparently, increasing the heap size available to the
>> reducer does not help - it is already at 84GB (-Xmx84G)
>>
>> If I cannot reduce the size of the object that I need to serialize (as
>> the reducer constructs this object by combining the object data), what
>> should I try to work around this problem?
>>
>> Thanks,
>>
>> Yuriy
>>
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Yuriy <yu...@gmail.com>.
Thank you, Alexander. That, at least, explains the problem. And what should
be the workaround if the combined set of data is larger than 2 GB?


On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> Max array size is max integer. So, byte array can not be bigger than 2GB
> On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:
>
>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>> which uses a simple array under the cover.
>>
>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>
>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>
>> Looks like the system is unable to allocate the continuous array of the
>> requested size. Apparently, increasing the heap size available to the
>> reducer does not help - it is already at 84GB (-Xmx84G)
>>
>> If I cannot reduce the size of the object that I need to serialize (as
>> the reducer constructs this object by combining the object data), what
>> should I try to work around this problem?
>>
>> Thanks,
>>
>> Yuriy
>>
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Alexander Pivovarov <ap...@gmail.com>.
Max array size is max integer. So, byte array can not be bigger than 2GB
On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:

> Hadoop Writable interface relies on "public void write(DataOutput out)" method.
> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
> which uses a simple array under the cover.
>
> When I try to write a lot of data in DataOutput in my reducer, I get:
>
> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
> java.io.DataOutputStream.write(DataOutputStream.java:107) at
> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>
> Looks like the system is unable to allocate the continuous array of the
> requested size. Apparently, increasing the heap size available to the
> reducer does not help - it is already at 84GB (-Xmx84G)
>
> If I cannot reduce the size of the object that I need to serialize (as the
> reducer constructs this object by combining the object data), what should I
> try to work around this problem?
>
> Thanks,
>
> Yuriy
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Alexander Pivovarov <ap...@gmail.com>.
Max array size is max integer. So, byte array can not be bigger than 2GB
On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:

> Hadoop Writable interface relies on "public void write(DataOutput out)" method.
> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
> which uses a simple array under the cover.
>
> When I try to write a lot of data in DataOutput in my reducer, I get:
>
> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
> java.io.DataOutputStream.write(DataOutputStream.java:107) at
> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>
> Looks like the system is unable to allocate the continuous array of the
> requested size. Apparently, increasing the heap size available to the
> reducer does not help - it is already at 84GB (-Xmx84G)
>
> If I cannot reduce the size of the object that I need to serialize (as the
> reducer constructs this object by combining the object data), what should I
> try to work around this problem?
>
> Thanks,
>
> Yuriy
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Alexander Pivovarov <ap...@gmail.com>.
Max array size is max integer. So, byte array can not be bigger than 2GB
On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:

> Hadoop Writable interface relies on "public void write(DataOutput out)" method.
> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
> which uses a simple array under the cover.
>
> When I try to write a lot of data in DataOutput in my reducer, I get:
>
> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
> java.io.DataOutputStream.write(DataOutputStream.java:107) at
> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>
> Looks like the system is unable to allocate the continuous array of the
> requested size. Apparently, increasing the heap size available to the
> reducer does not help - it is already at 84GB (-Xmx84G)
>
> If I cannot reduce the size of the object that I need to serialize (as the
> reducer constructs this object by combining the object data), what should I
> try to work around this problem?
>
> Thanks,
>
> Yuriy
>

Re: How to serialize very large object in Hadoop Writable?

Posted by Alexander Pivovarov <ap...@gmail.com>.
Max array size is max integer. So, byte array can not be bigger than 2GB
On Aug 22, 2014 1:41 PM, "Yuriy" <yu...@gmail.com> wrote:

> Hadoop Writable interface relies on "public void write(DataOutput out)" method.
> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
> which uses a simple array under the cover.
>
> When I try to write a lot of data in DataOutput in my reducer, I get:
>
> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
> java.io.DataOutputStream.write(DataOutputStream.java:107) at
> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>
> Looks like the system is unable to allocate the continuous array of the
> requested size. Apparently, increasing the heap size available to the
> reducer does not help - it is already at 84GB (-Xmx84G)
>
> If I cannot reduce the size of the object that I need to serialize (as the
> reducer constructs this object by combining the object data), what should I
> try to work around this problem?
>
> Thanks,
>
> Yuriy
>