You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Abi <an...@gmail.com> on 2016/05/10 00:24:06 UTC

Accumulator question

I am splitting an integer array in 2 partitions and using an accumulator  to sum the array. problem is

1.  I am not seeing execution time becoming half of a linear summing.

2. The second node (from looking at timestamps) takes 3 times as long as the first node. This gives the impression it is "waiting" for the first node to finish.

Hence,  I am given the impression using accumulator.sum () in the kernel and rdd.foreach (kernel) is making things sequential. 

Any api/setting suggestions where I could make things parallel ?


 

Re: Accumulator question

Posted by Rishi Mishra <rm...@snappydata.io>.
Your mail does not describe  much , but wont a simple reduce function help
you ?
Something like as below

val data = Seq(1,2,3,4,5,6,7)
val rdd = sc.parallelize(data, 2)
val sum = rdd.reduce((a,b) => a+b)



Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra

On Tue, May 10, 2016 at 10:44 AM, Abi <an...@gmail.com> wrote:

> I am splitting an integer array in 2 partitions and using an accumulator
> to sum the array. problem is
>
> 1. I am not seeing execution time becoming half of a linear summing.
>
> 2. The second node (from looking at timestamps) takes 3 times as long as
> the first node. This gives the impression it is "waiting" for the first
> node to finish.
>
> Hence, I am given the impression using accumulator.sum () in the kernel
> and rdd.foreach (kernel) is making things sequential.
>
> Any api/setting suggestions where I could make things parallel ?
>
> On Mon, May 9, 2016 at 8:24 PM, Abi <an...@gmail.com> wrote:
>
>> I am splitting an integer array in 2 partitions and using an accumulator
>> to sum the array. problem is
>>
>> 1. I am not seeing execution time becoming half of a linear summing.
>>
>> 2. The second node (from looking at timestamps) takes 3 times as long as
>> the first node. This gives the impression it is "waiting" for the first
>> node to finish.
>>
>> Hence, I am given the impression using accumulator.sum () in the kernel
>> and rdd.foreach (kernel) is making things sequential.
>>
>> Any api/setting suggestions where I could make things parallel ?
>>
>>
>>
>

Re: Accumulator question

Posted by Abi <an...@gmail.com>.
I am splitting an integer array in 2 partitions and using an accumulator to
sum the array. problem is

1. I am not seeing execution time becoming half of a linear summing.

2. The second node (from looking at timestamps) takes 3 times as long as
the first node. This gives the impression it is "waiting" for the first
node to finish.

Hence, I am given the impression using accumulator.sum () in the kernel and
rdd.foreach (kernel) is making things sequential.

Any api/setting suggestions where I could make things parallel ?

On Mon, May 9, 2016 at 8:24 PM, Abi <an...@gmail.com> wrote:

> I am splitting an integer array in 2 partitions and using an accumulator
> to sum the array. problem is
>
> 1. I am not seeing execution time becoming half of a linear summing.
>
> 2. The second node (from looking at timestamps) takes 3 times as long as
> the first node. This gives the impression it is "waiting" for the first
> node to finish.
>
> Hence, I am given the impression using accumulator.sum () in the kernel
> and rdd.foreach (kernel) is making things sequential.
>
> Any api/setting suggestions where I could make things parallel ?
>
>
>

Re: Accumulator question

Posted by Abi <an...@gmail.com>.

On May 9, 2016 8:24:06 PM EDT, Abi <an...@gmail.com> wrote:
>I am splitting an integer array in 2 partitions and using an
>accumulator  to sum the array. problem is
>
>1.  I am not seeing execution time becoming half of a linear summing.
>
>2. The second node (from looking at timestamps) takes 3 times as long
>as the first node. This gives the impression it is "waiting" for the
>first node to finish.
>
>Hence,  I am given the impression using accumulator.sum () in the
>kernel and rdd.foreach (kernel) is making things sequential. 
>
>Any api/setting suggestions where I could make things parallel ?
>
>
>