You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Pulasthi Supun Wickramasinghe <pu...@gmail.com> on 2016/05/19 03:22:09 UTC

How to perform reduce operation in the same order as partition indexes

Hi Devs/All,

I am pretty new to Spark. I have a program which does some map reduce
operations with matrices. Here *shortrddFinal* is a of type "
*RDD[Array[Short]]"* and consists of several partitions

*var BC =
shortrddFinal.mapPartitionsWithIndex(calculateBCInternal).reduce(mergeBC)*

The map function produces a "Array[Array[Double]]" and at the reduce step i
need to merge all the 2 dimensional double arrays produced for each
partition into one big matrix. But i also need to keep the same order as
the partitions. that is the 2D double array produced for partition 0 should
be the first set of rows in the matrix and then the 2d double array
produced for partition 1 and so on. Is there a way to enforce the order in
the reduce step.

Thanks in advance

Best Regards,
Pulasthi
-- 
Pulasthi S. Wickramasinghe
Graduate Student  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035

Re: How to perform reduce operation in the same order as partition indexes

Posted by Pulasthi Supun Wickramasinghe <pu...@gmail.com>.
Hi Ayan,

Thanks for the reply. Yes that is what i am currently doing. I thought
there may be a more efficient way provided by spark that i could use
directly.

Best Regards,
Pulasthi

On Thu, May 19, 2016 at 6:42 PM, ayan guha <gu...@gmail.com> wrote:

> You can add the index from mappartitionwithindex in the output and order
> based on that in merge step
> On 19 May 2016 13:22, "Pulasthi Supun Wickramasinghe" <
> pulasthi911@gmail.com> wrote:
>
>> Hi Devs/All,
>>
>> I am pretty new to Spark. I have a program which does some map reduce
>> operations with matrices. Here *shortrddFinal* is a of type "
>> *RDD[Array[Short]]"* and consists of several partitions
>>
>> *var BC =
>> shortrddFinal.mapPartitionsWithIndex(calculateBCInternal).reduce(mergeBC)*
>>
>> The map function produces a "Array[Array[Double]]" and at the reduce step
>> i need to merge all the 2 dimensional double arrays produced for each
>> partition into one big matrix. But i also need to keep the same order as
>> the partitions. that is the 2D double array produced for partition 0 should
>> be the first set of rows in the matrix and then the 2d double array
>> produced for partition 1 and so on. Is there a way to enforce the order in
>> the reduce step.
>>
>> Thanks in advance
>>
>> Best Regards,
>> Pulasthi
>> --
>> Pulasthi S. Wickramasinghe
>> Graduate Student  | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> cell: 224-386-9035
>>
>


-- 
Pulasthi S. Wickramasinghe
Graduate Student  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035

Re: How to perform reduce operation in the same order as partition indexes

Posted by ayan guha <gu...@gmail.com>.
You can add the index from mappartitionwithindex in the output and order
based on that in merge step
On 19 May 2016 13:22, "Pulasthi Supun Wickramasinghe" <pu...@gmail.com>
wrote:

> Hi Devs/All,
>
> I am pretty new to Spark. I have a program which does some map reduce
> operations with matrices. Here *shortrddFinal* is a of type "
> *RDD[Array[Short]]"* and consists of several partitions
>
> *var BC =
> shortrddFinal.mapPartitionsWithIndex(calculateBCInternal).reduce(mergeBC)*
>
> The map function produces a "Array[Array[Double]]" and at the reduce step
> i need to merge all the 2 dimensional double arrays produced for each
> partition into one big matrix. But i also need to keep the same order as
> the partitions. that is the 2D double array produced for partition 0 should
> be the first set of rows in the matrix and then the 2d double array
> produced for partition 1 and so on. Is there a way to enforce the order in
> the reduce step.
>
> Thanks in advance
>
> Best Regards,
> Pulasthi
> --
> Pulasthi S. Wickramasinghe
> Graduate Student  | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> cell: 224-386-9035
>