You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Arni Sumarlidason <su...@gmail.com> on 2015/09/02 23:08:45 UTC

processing data evenly

I'm having problems getting my data reduced evenly across nodes.

-> map a 200,000 line single text file and output <0L,line>
-> custom partitioner returning static member i++%numPartitions in an
attempt to distribute each line to as many reducers as possible
-> reduce; I end up with 13 or 18 nodes busy of 100 nodes.

My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
How can i achieve this?

Re: processing data evenly

Posted by Chris Mawata <ch...@gmail.com>.

Static only makes sense in the same JVM and classloader. In a distributed
setting it is not useful
On Sep 2, 2015 5:08 PM, "Arni Sumarlidason" <su...@gmail.com> wrote:

> I'm having problems getting my data reduced evenly across nodes.
>
> -> map a 200,000 line single text file and output <0L,line>
> -> custom partitioner returning static member i++%numPartitions in an
> attempt to distribute each line to as many reducers as possible
> -> reduce; I end up with 13 or 18 nodes busy of 100 nodes.
>
> My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
> How can i achieve this?
>

Re: processing data evenly

Posted by Chris Mawata <ch...@gmail.com>.

Static only makes sense in the same JVM and classloader. In a distributed
setting it is not useful
On Sep 2, 2015 5:08 PM, "Arni Sumarlidason" <su...@gmail.com> wrote:

> I'm having problems getting my data reduced evenly across nodes.
>
> -> map a 200,000 line single text file and output <0L,line>
> -> custom partitioner returning static member i++%numPartitions in an
> attempt to distribute each line to as many reducers as possible
> -> reduce; I end up with 13 or 18 nodes busy of 100 nodes.
>
> My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
> How can i achieve this?
>

Re: processing data evenly

Posted by Chris Mawata <ch...@gmail.com>.

Static only makes sense in the same JVM and classloader. In a distributed
setting it is not useful
On Sep 2, 2015 5:08 PM, "Arni Sumarlidason" <su...@gmail.com> wrote:

> I'm having problems getting my data reduced evenly across nodes.
>
> -> map a 200,000 line single text file and output <0L,line>
> -> custom partitioner returning static member i++%numPartitions in an
> attempt to distribute each line to as many reducers as possible
> -> reduce; I end up with 13 or 18 nodes busy of 100 nodes.
>
> My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
> How can i achieve this?
>

Re: processing data evenly

Posted by Arni Sumarlidason <su...@gmail.com>.

Evening all,

I ended up doing a map using the hashCode of the host's ip address; giving
me reductions by machine. However, I am now experiencing memory problems
processing sequence files of large TwoDWritableArrays; specifically it
seems to process normally until its about to write the result -- then
crashes. Is there any thing I can do when processing large sequence files
other than increasing available heap?

Arni

On Wed, Sep 2, 2015 at 5:08 PM, Arni Sumarlidason <su...@gmail.com>
wrote:

> I'm having problems getting my data reduced evenly across nodes.
>
> -> map a 200,000 line single text file and output <0L,line>
> -> custom partitioner returning static member i++%numPartitions in an
> attempt to distribute each line to as many reducers as possible
> -> reduce; I end up with 13 or 18 nodes busy of 100 nodes.
>
> My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
> How can i achieve this?
>

Re: processing data evenly

Posted by Arni Sumarlidason <su...@gmail.com>.

Evening all,

I ended up doing a map using the hashCode of the host's ip address; giving
me reductions by machine. However, I am now experiencing memory problems
processing sequence files of large TwoDWritableArrays; specifically it
seems to process normally until its about to write the result -- then
crashes. Is there any thing I can do when processing large sequence files
other than increasing available heap?

Arni

On Wed, Sep 2, 2015 at 5:08 PM, Arni Sumarlidason <su...@gmail.com>
wrote:

> I'm having problems getting my data reduced evenly across nodes.
>
> -> map a 200,000 line single text file and output <0L,line>
> -> custom partitioner returning static member i++%numPartitions in an
> attempt to distribute each line to as many reducers as possible
> -> reduce; I end up with 13 or 18 nodes busy of 100 nodes.
>
> My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
> How can i achieve this?
>

Re: processing data evenly

Posted by Arni Sumarlidason <su...@gmail.com>.

Evening all,

I ended up doing a map using the hashCode of the host's ip address; giving
me reductions by machine. However, I am now experiencing memory problems
processing sequence files of large TwoDWritableArrays; specifically it
seems to process normally until its about to write the result -- then
crashes. Is there any thing I can do when processing large sequence files
other than increasing available heap?

Arni

On Wed, Sep 2, 2015 at 5:08 PM, Arni Sumarlidason <su...@gmail.com>
wrote:

> I'm having problems getting my data reduced evenly across nodes.
>
> -> map a 200,000 line single text file and output <0L,line>
> -> custom partitioner returning static member i++%numPartitions in an
> attempt to distribute each line to as many reducers as possible
> -> reduce; I end up with 13 or 18 nodes busy of 100 nodes.
>
> My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
> How can i achieve this?
>

Re: processing data evenly

Posted by Arni Sumarlidason <su...@gmail.com>.

Evening all,

I ended up doing a map using the hashCode of the host's ip address; giving
me reductions by machine. However, I am now experiencing memory problems
processing sequence files of large TwoDWritableArrays; specifically it
seems to process normally until its about to write the result -- then
crashes. Is there any thing I can do when processing large sequence files
other than increasing available heap?

Arni

On Wed, Sep 2, 2015 at 5:08 PM, Arni Sumarlidason <su...@gmail.com>
wrote:

> I'm having problems getting my data reduced evenly across nodes.
>
> -> map a 200,000 line single text file and output <0L,line>
> -> custom partitioner returning static member i++%numPartitions in an
> attempt to distribute each line to as many reducers as possible
> -> reduce; I end up with 13 or 18 nodes busy of 100 nodes.
>
> My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
> How can i achieve this?
>

Re: processing data evenly

Posted by Chris Mawata <ch...@gmail.com>.

Static only makes sense in the same JVM and classloader. In a distributed
setting it is not useful
On Sep 2, 2015 5:08 PM, "Arni Sumarlidason" <su...@gmail.com> wrote:

> I'm having problems getting my data reduced evenly across nodes.
>
> -> map a 200,000 line single text file and output <0L,line>
> -> custom partitioner returning static member i++%numPartitions in an
> attempt to distribute each line to as many reducers as possible
> -> reduce; I end up with 13 or 18 nodes busy of 100 nodes.
>
> My hope is to have 300 containers on 100 nodes; each with ~666 lines each.
> How can i achieve this?
>