You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aureliano Buendia <bu...@gmail.com> on 2014/01/03 00:23:57 UTC

How to deal with multidimensional keys?

Hi,

How is it possible to reduce by multidimensional keys?

For example, if every line is a tuple like:

(i, j, k, value)

or, alternatively:

((I, j, k), value)

how can spark handle reducing over j, or k?

Re: How to deal with multidimensional keys?

Posted by "K. Shankari" <sh...@eecs.berkeley.edu>.
I have had to use this as well.

Sometimes, I create a POJO to hold the multi-dimensional key to make things
easier.

ie.
class MultiKey(i, j, k) {
}

then I can define a reduce function that is over the multikey, e.g.

def reduceByI(mkv1: (MultiKey, Value), mkv2: (MultiKey: Value)) = if
(mkv1.i > mkv2.i) v1 else v2

and then I can do

rdd.reduce(reduceByI)

Thanks,
Shankari


On Thu, Jan 2, 2014 at 3:28 PM, Andrew Ash <an...@andrewash.com> wrote:

> If you had RDD[[i, j, k], value] then you could reduce by j by essentially
> mapping j into the key slot, doing the reduce, and then mapping it back:
>
> rdd.map( ((i,j,k),v) => (j, (i, k, v)).reduce( ... ).map( (j,(i,k,v)) =>
> ((i,j,k),v))
>
> It's not pretty, but I've had to use this pattern before too.
>
>
> On Thu, Jan 2, 2014 at 6:23 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>
>> Hi,
>>
>> How is it possible to reduce by multidimensional keys?
>>
>> For example, if every line is a tuple like:
>>
>> (i, j, k, value)
>>
>> or, alternatively:
>>
>> ((I, j, k), value)
>>
>> how can spark handle reducing over j, or k?
>>
>
>

Re: How to deal with multidimensional keys?

Posted by Andrew Ash <an...@andrewash.com>.
If you had RDD[[i, j, k], value] then you could reduce by j by essentially
mapping j into the key slot, doing the reduce, and then mapping it back:

rdd.map( ((i,j,k),v) => (j, (i, k, v)).reduce( ... ).map( (j,(i,k,v)) =>
((i,j,k),v))

It's not pretty, but I've had to use this pattern before too.


On Thu, Jan 2, 2014 at 6:23 PM, Aureliano Buendia <bu...@gmail.com>wrote:

> Hi,
>
> How is it possible to reduce by multidimensional keys?
>
> For example, if every line is a tuple like:
>
> (i, j, k, value)
>
> or, alternatively:
>
> ((I, j, k), value)
>
> how can spark handle reducing over j, or k?
>