You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by janardhan shetty <ja...@gmail.com> on 2016/07/24 05:22:55 UTC

Maintaining order of pair rdd

I have a key,value pair rdd where value is an array of Ints. I need to
maintain the order of the value in order to execute downstream
modifications. How do we maintain the order of values?
Ex:
rdd = (id1,[5,2,3,15],
Id2,[9,4,2,5]....)

Followup question how do we compare between one element in rdd with all
other elements ?

Re: Maintaining order of pair rdd

Posted by Kuchekar <ku...@gmail.com>.
Hi Janardhan,

                       You could something like this :

For maintaining the insertion order by the key  first partition by Key (so
that each key is located in the same partition) and after that you can do
something like this.

RDD.mapValues( x => ArrayBuffer(x)).reduceByKey(x,y => x++y )

The idea is create an ArrayBuffer (This maintains insertion order).


More elegant solution would be using zipByIndex on the pair RDD and then
sort by the index in each groupByKey

RDD.zipByIndex ==> This will give you something like this value
((x,y),index)

now map it like this (x,(y,index))

now reduce by key so and then sort the internal with the index value.


Thanks.
Kuchekar, Nilesh

On Tue, Jul 26, 2016 at 7:35 PM, janardhan shetty <ja...@gmail.com>
wrote:

> Let me provide step wise details:
>
> 1.
> I have an RDD  = {
> (ID2,18159) - *element 1  *
> (ID1,18159) - *element 2*
> (ID3,18159) - *element 3*
> (ID2,36318) - *element 4 *
> (ID1,36318) - *element 5*
> (ID3,36318)
> (ID2,54477)
> (ID1,54477)
> (ID3,54477)
> }
>
> 2. RDD.groupByKey().mapValues(v => v.toArray())
>
> Array(
> (ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*, 145272,
> 100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
> 45431, 100136)),
> (ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022, 39244,
> 100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*, 308703,
> 160992, 45431, 162076)),
> (ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
> 45431, *36318*, 162076))
> )
>
>
> whereas in Step 2 I need as below:
>
> Array(
> (ID1,Array(*18159*,*36318*, *54477,...*)),
> (ID3,Array(*18159*,*36318*, *54477, ...*)),
> (ID2,Array(*18159*,*36318*, *54477, ...*))
> )
>
> Does this help ?
>
> On Tue, Jul 26, 2016 at 2:25 AM, Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Apologies janardhan, i always get confused on this
>> Ok. so you have a  (key, val) RDD (val is irrelevant here)
>>
>> then you can do this
>> val reduced = myRDD.reduceByKey((first, second) => first  ++ second)
>>
>> val sorted = reduced.sortBy(tpl => tpl._1)
>>
>> hth
>>
>>
>>
>> On Tue, Jul 26, 2016 at 3:31 AM, janardhan shetty <janardhanp22@gmail.com
>> > wrote:
>>
>>> groupBy is a shuffle operation and index is already lost in this process
>>> if I am not wrong and don't see *sortWith* operation on RDD.
>>>
>>> Any suggestions or help ?
>>>
>>> On Mon, Jul 25, 2016 at 12:58 AM, Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>  after you do a groupBy you should use a sortWith.
>>>> Basically , a groupBy reduces your structure to (anyone correct me if i
>>>> m wrong) a RDD[(key,val)], which you can see as a tuple.....so you could
>>>> use sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1)
>>>> hth
>>>>
>>>> On Mon, Jul 25, 2016 at 1:21 AM, janardhan shetty <
>>>> janardhanp22@gmail.com> wrote:
>>>>
>>>>> Thanks Marco. This solved the order problem. Had another question
>>>>> which is prefix to this.
>>>>>
>>>>> As you can see below ID2,ID1 and ID3 are in order and I need to
>>>>> maintain this index order as well. But when we do groupByKey operation(*rdd.distinct.groupByKey().mapValues(v
>>>>> => v.toArray*))
>>>>> everything is *jumbled*.
>>>>> Is there any way we can maintain this order as well ?
>>>>>
>>>>> scala> RDD.foreach(println)
>>>>> (ID2,18159)
>>>>> (ID1,18159)
>>>>> (ID3,18159)
>>>>>
>>>>> (ID2,18159)
>>>>> (ID1,18159)
>>>>> (ID3,18159)
>>>>>
>>>>> (ID2,36318)
>>>>> (ID1,36318)
>>>>> (ID3,36318)
>>>>>
>>>>> (ID2,54477)
>>>>> (ID1,54477)
>>>>> (ID3,54477)
>>>>>
>>>>> *Jumbled version : *
>>>>> Array(
>>>>> (ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*,
>>>>> 145272, 100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683,
>>>>> 58866, 162076, 45431, 100136)),
>>>>> (ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022,
>>>>> 39244, 100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*,
>>>>> 308703, 160992, 45431, 162076)),
>>>>> (ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
>>>>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
>>>>> 45431, *36318*, 162076))
>>>>> )
>>>>>
>>>>> *Expected output:*
>>>>> Array(
>>>>> (ID1,Array(*18159*,*36318*, *54477,...*)),
>>>>> (ID3,Array(*18159*,*36318*, *54477, ...*)),
>>>>> (ID2,Array(*18159*,*36318*, *54477, ...*))
>>>>> )
>>>>>
>>>>> As you can see after *groupbyKey* operation is complete item 18519 is
>>>>> in index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected
>>>>> is index 0
>>>>>
>>>>>
>>>>> On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello
>>>>>>  Uhm you have an array containing 3 tuples?
>>>>>> If all the arrays have same length, you can just zip all of them,
>>>>>> creatings a list of tuples
>>>>>> then you can scan the list 5 by 5...?
>>>>>>
>>>>>> so something like
>>>>>>
>>>>>> (Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList
>>>>>>
>>>>>> this will give you a list of tuples of 3 elements containing each
>>>>>> items from ID1, ID2 and ID3  ... sample below
>>>>>> res: List((18159,100079,308703), (308703, 19622, 54477),
>>>>>> (72636,18159, 89366)..........)
>>>>>>
>>>>>> then you can use a recursive function to compare each element such as
>>>>>>
>>>>>> def iterate(lst:List[(Int, Int, Int)]):T = {
>>>>>>     if (lst.isEmpty): /// return your comparison
>>>>>>     else {
>>>>>>          val splits = lst.splitAt(5)
>>>>>>          // do sometjhing about it using splits._1
>>>>>>          iterate(splits._2)
>>>>>>    }
>>>>>>
>>>>>> will this help? or am i still missing something?
>>>>>>
>>>>>> kr
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 24 Jul 2016 5:52 pm, "janardhan shetty" <ja...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Array(
>>>>>>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477,
>>>>>>> 145272, 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866,
>>>>>>> 162076, 45431, 100136)),
>>>>>>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022,
>>>>>>> 39244, 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703,
>>>>>>> 160992, 45431, 162076)),
>>>>>>> (ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866,
>>>>>>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431,
>>>>>>> 36318, 162076))
>>>>>>> )
>>>>>>>
>>>>>>> I need to compare first 5 elements of ID1 with first five element of
>>>>>>> ID3  next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
>>>>>>> order until the end of number of elements.
>>>>>>> Let me know if this helps
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mmistroni@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Apologies I misinterpreted.... could you post two use cases?
>>>>>>>> Kr
>>>>>>>>
>>>>>>>> On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Marco,
>>>>>>>>>
>>>>>>>>> Thanks for the response. It is indexed order and not ascending or
>>>>>>>>> descending order.
>>>>>>>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Use map values to transform to an rdd where values are sorted?
>>>>>>>>>> Hth
>>>>>>>>>>
>>>>>>>>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <
>>>>>>>>>> janardhanp22@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I have a key,value pair rdd where value is an array of Ints. I
>>>>>>>>>>> need to maintain the order of the value in order to execute downstream
>>>>>>>>>>> modifications. How do we maintain the order of values?
>>>>>>>>>>> Ex:
>>>>>>>>>>> rdd = (id1,[5,2,3,15],
>>>>>>>>>>> Id2,[9,4,2,5]....)
>>>>>>>>>>>
>>>>>>>>>>> Followup question how do we compare between one element in rdd
>>>>>>>>>>> with all other elements ?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Maintaining order of pair rdd

Posted by janardhan shetty <ja...@gmail.com>.
Let me provide step wise details:

1.
I have an RDD  = {
(ID2,18159) - *element 1  *
(ID1,18159) - *element 2*
(ID3,18159) - *element 3*
(ID2,36318) - *element 4 *
(ID1,36318) - *element 5*
(ID3,36318)
(ID2,54477)
(ID1,54477)
(ID3,54477)
}

2. RDD.groupByKey().mapValues(v => v.toArray())

Array(
(ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*, 145272,
100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
45431, 100136)),
(ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022, 39244,
100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*, 308703,
160992, 45431, 162076)),
(ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
45431, *36318*, 162076))
)


whereas in Step 2 I need as below:

Array(
(ID1,Array(*18159*,*36318*, *54477,...*)),
(ID3,Array(*18159*,*36318*, *54477, ...*)),
(ID2,Array(*18159*,*36318*, *54477, ...*))
)

Does this help ?

On Tue, Jul 26, 2016 at 2:25 AM, Marco Mistroni <mm...@gmail.com> wrote:

> Apologies janardhan, i always get confused on this
> Ok. so you have a  (key, val) RDD (val is irrelevant here)
>
> then you can do this
> val reduced = myRDD.reduceByKey((first, second) => first  ++ second)
>
> val sorted = reduced.sortBy(tpl => tpl._1)
>
> hth
>
>
>
> On Tue, Jul 26, 2016 at 3:31 AM, janardhan shetty <ja...@gmail.com>
> wrote:
>
>> groupBy is a shuffle operation and index is already lost in this process
>> if I am not wrong and don't see *sortWith* operation on RDD.
>>
>> Any suggestions or help ?
>>
>> On Mon, Jul 25, 2016 at 12:58 AM, Marco Mistroni <mm...@gmail.com>
>> wrote:
>>
>>> Hi
>>>  after you do a groupBy you should use a sortWith.
>>> Basically , a groupBy reduces your structure to (anyone correct me if i
>>> m wrong) a RDD[(key,val)], which you can see as a tuple.....so you could
>>> use sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1)
>>> hth
>>>
>>> On Mon, Jul 25, 2016 at 1:21 AM, janardhan shetty <
>>> janardhanp22@gmail.com> wrote:
>>>
>>>> Thanks Marco. This solved the order problem. Had another question which
>>>> is prefix to this.
>>>>
>>>> As you can see below ID2,ID1 and ID3 are in order and I need to
>>>> maintain this index order as well. But when we do groupByKey operation(*rdd.distinct.groupByKey().mapValues(v
>>>> => v.toArray*))
>>>> everything is *jumbled*.
>>>> Is there any way we can maintain this order as well ?
>>>>
>>>> scala> RDD.foreach(println)
>>>> (ID2,18159)
>>>> (ID1,18159)
>>>> (ID3,18159)
>>>>
>>>> (ID2,18159)
>>>> (ID1,18159)
>>>> (ID3,18159)
>>>>
>>>> (ID2,36318)
>>>> (ID1,36318)
>>>> (ID3,36318)
>>>>
>>>> (ID2,54477)
>>>> (ID1,54477)
>>>> (ID3,54477)
>>>>
>>>> *Jumbled version : *
>>>> Array(
>>>> (ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*,
>>>> 145272, 100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683,
>>>> 58866, 162076, 45431, 100136)),
>>>> (ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022,
>>>> 39244, 100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*,
>>>> 308703, 160992, 45431, 162076)),
>>>> (ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
>>>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
>>>> 45431, *36318*, 162076))
>>>> )
>>>>
>>>> *Expected output:*
>>>> Array(
>>>> (ID1,Array(*18159*,*36318*, *54477,...*)),
>>>> (ID3,Array(*18159*,*36318*, *54477, ...*)),
>>>> (ID2,Array(*18159*,*36318*, *54477, ...*))
>>>> )
>>>>
>>>> As you can see after *groupbyKey* operation is complete item 18519 is
>>>> in index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected
>>>> is index 0
>>>>
>>>>
>>>> On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello
>>>>>  Uhm you have an array containing 3 tuples?
>>>>> If all the arrays have same length, you can just zip all of them,
>>>>> creatings a list of tuples
>>>>> then you can scan the list 5 by 5...?
>>>>>
>>>>> so something like
>>>>>
>>>>> (Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList
>>>>>
>>>>> this will give you a list of tuples of 3 elements containing each
>>>>> items from ID1, ID2 and ID3  ... sample below
>>>>> res: List((18159,100079,308703), (308703, 19622, 54477), (72636,18159,
>>>>> 89366)..........)
>>>>>
>>>>> then you can use a recursive function to compare each element such as
>>>>>
>>>>> def iterate(lst:List[(Int, Int, Int)]):T = {
>>>>>     if (lst.isEmpty): /// return your comparison
>>>>>     else {
>>>>>          val splits = lst.splitAt(5)
>>>>>          // do sometjhing about it using splits._1
>>>>>          iterate(splits._2)
>>>>>    }
>>>>>
>>>>> will this help? or am i still missing something?
>>>>>
>>>>> kr
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 24 Jul 2016 5:52 pm, "janardhan shetty" <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Array(
>>>>>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
>>>>>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
>>>>>> 45431, 100136)),
>>>>>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022,
>>>>>> 39244, 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703,
>>>>>> 160992, 45431, 162076)),
>>>>>> (ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866,
>>>>>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431,
>>>>>> 36318, 162076))
>>>>>> )
>>>>>>
>>>>>> I need to compare first 5 elements of ID1 with first five element of
>>>>>> ID3  next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
>>>>>> order until the end of number of elements.
>>>>>> Let me know if this helps
>>>>>>
>>>>>>
>>>>>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Apologies I misinterpreted.... could you post two use cases?
>>>>>>> Kr
>>>>>>>
>>>>>>> On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Marco,
>>>>>>>>
>>>>>>>> Thanks for the response. It is indexed order and not ascending or
>>>>>>>> descending order.
>>>>>>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Use map values to transform to an rdd where values are sorted?
>>>>>>>>> Hth
>>>>>>>>>
>>>>>>>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I have a key,value pair rdd where value is an array of Ints. I
>>>>>>>>>> need to maintain the order of the value in order to execute downstream
>>>>>>>>>> modifications. How do we maintain the order of values?
>>>>>>>>>> Ex:
>>>>>>>>>> rdd = (id1,[5,2,3,15],
>>>>>>>>>> Id2,[9,4,2,5]....)
>>>>>>>>>>
>>>>>>>>>> Followup question how do we compare between one element in rdd
>>>>>>>>>> with all other elements ?
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Maintaining order of pair rdd

Posted by Marco Mistroni <mm...@gmail.com>.
Apologies janardhan, i always get confused on this
Ok. so you have a  (key, val) RDD (val is irrelevant here)

then you can do this
val reduced = myRDD.reduceByKey((first, second) => first  ++ second)

val sorted = reduced.sortBy(tpl => tpl._1)

hth



On Tue, Jul 26, 2016 at 3:31 AM, janardhan shetty <ja...@gmail.com>
wrote:

> groupBy is a shuffle operation and index is already lost in this process
> if I am not wrong and don't see *sortWith* operation on RDD.
>
> Any suggestions or help ?
>
> On Mon, Jul 25, 2016 at 12:58 AM, Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Hi
>>  after you do a groupBy you should use a sortWith.
>> Basically , a groupBy reduces your structure to (anyone correct me if i m
>> wrong) a RDD[(key,val)], which you can see as a tuple.....so you could use
>> sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1)
>> hth
>>
>> On Mon, Jul 25, 2016 at 1:21 AM, janardhan shetty <janardhanp22@gmail.com
>> > wrote:
>>
>>> Thanks Marco. This solved the order problem. Had another question which
>>> is prefix to this.
>>>
>>> As you can see below ID2,ID1 and ID3 are in order and I need to maintain
>>> this index order as well. But when we do groupByKey operation(*rdd.distinct.groupByKey().mapValues(v
>>> => v.toArray*))
>>> everything is *jumbled*.
>>> Is there any way we can maintain this order as well ?
>>>
>>> scala> RDD.foreach(println)
>>> (ID2,18159)
>>> (ID1,18159)
>>> (ID3,18159)
>>>
>>> (ID2,18159)
>>> (ID1,18159)
>>> (ID3,18159)
>>>
>>> (ID2,36318)
>>> (ID1,36318)
>>> (ID3,36318)
>>>
>>> (ID2,54477)
>>> (ID1,54477)
>>> (ID3,54477)
>>>
>>> *Jumbled version : *
>>> Array(
>>> (ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*,
>>> 145272, 100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683,
>>> 58866, 162076, 45431, 100136)),
>>> (ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022,
>>> 39244, 100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*,
>>> 308703, 160992, 45431, 162076)),
>>> (ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
>>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
>>> 45431, *36318*, 162076))
>>> )
>>>
>>> *Expected output:*
>>> Array(
>>> (ID1,Array(*18159*,*36318*, *54477,...*)),
>>> (ID3,Array(*18159*,*36318*, *54477, ...*)),
>>> (ID2,Array(*18159*,*36318*, *54477, ...*))
>>> )
>>>
>>> As you can see after *groupbyKey* operation is complete item 18519 is
>>> in index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected
>>> is index 0
>>>
>>>
>>> On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> Hello
>>>>  Uhm you have an array containing 3 tuples?
>>>> If all the arrays have same length, you can just zip all of them,
>>>> creatings a list of tuples
>>>> then you can scan the list 5 by 5...?
>>>>
>>>> so something like
>>>>
>>>> (Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList
>>>>
>>>> this will give you a list of tuples of 3 elements containing each items
>>>> from ID1, ID2 and ID3  ... sample below
>>>> res: List((18159,100079,308703), (308703, 19622, 54477), (72636,18159,
>>>> 89366)..........)
>>>>
>>>> then you can use a recursive function to compare each element such as
>>>>
>>>> def iterate(lst:List[(Int, Int, Int)]):T = {
>>>>     if (lst.isEmpty): /// return your comparison
>>>>     else {
>>>>          val splits = lst.splitAt(5)
>>>>          // do sometjhing about it using splits._1
>>>>          iterate(splits._2)
>>>>    }
>>>>
>>>> will this help? or am i still missing something?
>>>>
>>>> kr
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 24 Jul 2016 5:52 pm, "janardhan shetty" <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> Array(
>>>>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
>>>>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
>>>>> 45431, 100136)),
>>>>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
>>>>> 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703, 160992,
>>>>> 45431, 162076)),
>>>>> (ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866,
>>>>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431,
>>>>> 36318, 162076))
>>>>> )
>>>>>
>>>>> I need to compare first 5 elements of ID1 with first five element of
>>>>> ID3  next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
>>>>> order until the end of number of elements.
>>>>> Let me know if this helps
>>>>>
>>>>>
>>>>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Apologies I misinterpreted.... could you post two use cases?
>>>>>> Kr
>>>>>>
>>>>>> On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Marco,
>>>>>>>
>>>>>>> Thanks for the response. It is indexed order and not ascending or
>>>>>>> descending order.
>>>>>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Use map values to transform to an rdd where values are sorted?
>>>>>>>> Hth
>>>>>>>>
>>>>>>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have a key,value pair rdd where value is an array of Ints. I
>>>>>>>>> need to maintain the order of the value in order to execute downstream
>>>>>>>>> modifications. How do we maintain the order of values?
>>>>>>>>> Ex:
>>>>>>>>> rdd = (id1,[5,2,3,15],
>>>>>>>>> Id2,[9,4,2,5]....)
>>>>>>>>>
>>>>>>>>> Followup question how do we compare between one element in rdd
>>>>>>>>> with all other elements ?
>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>>
>

Re: Maintaining order of pair rdd

Posted by janardhan shetty <ja...@gmail.com>.
groupBy is a shuffle operation and index is already lost in this process if
I am not wrong and don't see *sortWith* operation on RDD.

Any suggestions or help ?

On Mon, Jul 25, 2016 at 12:58 AM, Marco Mistroni <mm...@gmail.com>
wrote:

> Hi
>  after you do a groupBy you should use a sortWith.
> Basically , a groupBy reduces your structure to (anyone correct me if i m
> wrong) a RDD[(key,val)], which you can see as a tuple.....so you could use
> sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1)
> hth
>
> On Mon, Jul 25, 2016 at 1:21 AM, janardhan shetty <ja...@gmail.com>
> wrote:
>
>> Thanks Marco. This solved the order problem. Had another question which
>> is prefix to this.
>>
>> As you can see below ID2,ID1 and ID3 are in order and I need to maintain
>> this index order as well. But when we do groupByKey operation(*rdd.distinct.groupByKey().mapValues(v
>> => v.toArray*))
>> everything is *jumbled*.
>> Is there any way we can maintain this order as well ?
>>
>> scala> RDD.foreach(println)
>> (ID2,18159)
>> (ID1,18159)
>> (ID3,18159)
>>
>> (ID2,18159)
>> (ID1,18159)
>> (ID3,18159)
>>
>> (ID2,36318)
>> (ID1,36318)
>> (ID3,36318)
>>
>> (ID2,54477)
>> (ID1,54477)
>> (ID3,54477)
>>
>> *Jumbled version : *
>> Array(
>> (ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*,
>> 145272, 100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683,
>> 58866, 162076, 45431, 100136)),
>> (ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022, 39244,
>> 100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*, 308703,
>> 160992, 45431, 162076)),
>> (ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
>> 45431, *36318*, 162076))
>> )
>>
>> *Expected output:*
>> Array(
>> (ID1,Array(*18159*,*36318*, *54477,...*)),
>> (ID3,Array(*18159*,*36318*, *54477, ...*)),
>> (ID2,Array(*18159*,*36318*, *54477, ...*))
>> )
>>
>> As you can see after *groupbyKey* operation is complete item 18519 is in
>> index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected is
>> index 0
>>
>>
>> On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mm...@gmail.com>
>> wrote:
>>
>>> Hello
>>>  Uhm you have an array containing 3 tuples?
>>> If all the arrays have same length, you can just zip all of them,
>>> creatings a list of tuples
>>> then you can scan the list 5 by 5...?
>>>
>>> so something like
>>>
>>> (Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList
>>>
>>> this will give you a list of tuples of 3 elements containing each items
>>> from ID1, ID2 and ID3  ... sample below
>>> res: List((18159,100079,308703), (308703, 19622, 54477), (72636,18159,
>>> 89366)..........)
>>>
>>> then you can use a recursive function to compare each element such as
>>>
>>> def iterate(lst:List[(Int, Int, Int)]):T = {
>>>     if (lst.isEmpty): /// return your comparison
>>>     else {
>>>          val splits = lst.splitAt(5)
>>>          // do sometjhing about it using splits._1
>>>          iterate(splits._2)
>>>    }
>>>
>>> will this help? or am i still missing something?
>>>
>>> kr
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 24 Jul 2016 5:52 pm, "janardhan shetty" <ja...@gmail.com>
>>> wrote:
>>>
>>>> Array(
>>>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
>>>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
>>>> 45431, 100136)),
>>>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
>>>> 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703, 160992,
>>>> 45431, 162076)),
>>>> (ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866,
>>>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431,
>>>> 36318, 162076))
>>>> )
>>>>
>>>> I need to compare first 5 elements of ID1 with first five element of
>>>> ID3  next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
>>>> order until the end of number of elements.
>>>> Let me know if this helps
>>>>
>>>>
>>>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Apologies I misinterpreted.... could you post two use cases?
>>>>> Kr
>>>>>
>>>>> On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Marco,
>>>>>>
>>>>>> Thanks for the response. It is indexed order and not ascending or
>>>>>> descending order.
>>>>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Use map values to transform to an rdd where values are sorted?
>>>>>>> Hth
>>>>>>>
>>>>>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have a key,value pair rdd where value is an array of Ints. I need
>>>>>>>> to maintain the order of the value in order to execute downstream
>>>>>>>> modifications. How do we maintain the order of values?
>>>>>>>> Ex:
>>>>>>>> rdd = (id1,[5,2,3,15],
>>>>>>>> Id2,[9,4,2,5]....)
>>>>>>>>
>>>>>>>> Followup question how do we compare between one element in rdd with
>>>>>>>> all other elements ?
>>>>>>>>
>>>>>>>
>>>>
>>
>

Re: Maintaining order of pair rdd

Posted by Marco Mistroni <mm...@gmail.com>.
Hi
 after you do a groupBy you should use a sortWith.
Basically , a groupBy reduces your structure to (anyone correct me if i m
wrong) a RDD[(key,val)], which you can see as a tuple.....so you could use
sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1)
hth

On Mon, Jul 25, 2016 at 1:21 AM, janardhan shetty <ja...@gmail.com>
wrote:

> Thanks Marco. This solved the order problem. Had another question which is
> prefix to this.
>
> As you can see below ID2,ID1 and ID3 are in order and I need to maintain
> this index order as well. But when we do groupByKey operation(*rdd.distinct.groupByKey().mapValues(v
> => v.toArray*))
> everything is *jumbled*.
> Is there any way we can maintain this order as well ?
>
> scala> RDD.foreach(println)
> (ID2,18159)
> (ID1,18159)
> (ID3,18159)
>
> (ID2,18159)
> (ID1,18159)
> (ID3,18159)
>
> (ID2,36318)
> (ID1,36318)
> (ID3,36318)
>
> (ID2,54477)
> (ID1,54477)
> (ID3,54477)
>
> *Jumbled version : *
> Array(
> (ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*, 145272,
> 100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
> 45431, 100136)),
> (ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022, 39244,
> 100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*, 308703,
> 160992, 45431, 162076)),
> (ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
> 45431, *36318*, 162076))
> )
>
> *Expected output:*
> Array(
> (ID1,Array(*18159*,*36318*, *54477,...*)),
> (ID3,Array(*18159*,*36318*, *54477, ...*)),
> (ID2,Array(*18159*,*36318*, *54477, ...*))
> )
>
> As you can see after *groupbyKey* operation is complete item 18519 is in
> index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected is
> index 0
>
>
> On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Hello
>>  Uhm you have an array containing 3 tuples?
>> If all the arrays have same length, you can just zip all of them,
>> creatings a list of tuples
>> then you can scan the list 5 by 5...?
>>
>> so something like
>>
>> (Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList
>>
>> this will give you a list of tuples of 3 elements containing each items
>> from ID1, ID2 and ID3  ... sample below
>> res: List((18159,100079,308703), (308703, 19622, 54477), (72636,18159,
>> 89366)..........)
>>
>> then you can use a recursive function to compare each element such as
>>
>> def iterate(lst:List[(Int, Int, Int)]):T = {
>>     if (lst.isEmpty): /// return your comparison
>>     else {
>>          val splits = lst.splitAt(5)
>>          // do sometjhing about it using splits._1
>>          iterate(splits._2)
>>    }
>>
>> will this help? or am i still missing something?
>>
>> kr
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 24 Jul 2016 5:52 pm, "janardhan shetty" <ja...@gmail.com>
>> wrote:
>>
>>> Array(
>>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
>>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
>>> 45431, 100136)),
>>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
>>> 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703, 160992,
>>> 45431, 162076)),
>>> (ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866,
>>> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431,
>>> 36318, 162076))
>>> )
>>>
>>> I need to compare first 5 elements of ID1 with first five element of
>>> ID3  next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
>>> order until the end of number of elements.
>>> Let me know if this helps
>>>
>>>
>>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> Apologies I misinterpreted.... could you post two use cases?
>>>> Kr
>>>>
>>>> On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> Marco,
>>>>>
>>>>> Thanks for the response. It is indexed order and not ascending or
>>>>> descending order.
>>>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com> wrote:
>>>>>
>>>>>> Use map values to transform to an rdd where values are sorted?
>>>>>> Hth
>>>>>>
>>>>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I have a key,value pair rdd where value is an array of Ints. I need
>>>>>>> to maintain the order of the value in order to execute downstream
>>>>>>> modifications. How do we maintain the order of values?
>>>>>>> Ex:
>>>>>>> rdd = (id1,[5,2,3,15],
>>>>>>> Id2,[9,4,2,5]....)
>>>>>>>
>>>>>>> Followup question how do we compare between one element in rdd with
>>>>>>> all other elements ?
>>>>>>>
>>>>>>
>>>
>

Re: Maintaining order of pair rdd

Posted by janardhan shetty <ja...@gmail.com>.
Thanks Marco. This solved the order problem. Had another question which is
prefix to this.

As you can see below ID2,ID1 and ID3 are in order and I need to maintain
this index order as well. But when we do groupByKey
operation(*rdd.distinct.groupByKey().mapValues(v
=> v.toArray*))
everything is *jumbled*.
Is there any way we can maintain this order as well ?

scala> RDD.foreach(println)
(ID2,18159)
(ID1,18159)
(ID3,18159)

(ID2,18159)
(ID1,18159)
(ID3,18159)

(ID2,36318)
(ID1,36318)
(ID3,36318)

(ID2,54477)
(ID1,54477)
(ID3,54477)

*Jumbled version : *
Array(
(ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*, 145272,
100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
45431, 100136)),
(ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022, 39244,
100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*, 308703,
160992, 45431, 162076)),
(ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
45431, *36318*, 162076))
)

*Expected output:*
Array(
(ID1,Array(*18159*,*36318*, *54477,...*)),
(ID3,Array(*18159*,*36318*, *54477, ...*)),
(ID2,Array(*18159*,*36318*, *54477, ...*))
)

As you can see after *groupbyKey* operation is complete item 18519 is in
index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected is
index 0


On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mm...@gmail.com>
wrote:

> Hello
>  Uhm you have an array containing 3 tuples?
> If all the arrays have same length, you can just zip all of them,
> creatings a list of tuples
> then you can scan the list 5 by 5...?
>
> so something like
>
> (Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList
>
> this will give you a list of tuples of 3 elements containing each items
> from ID1, ID2 and ID3  ... sample below
> res: List((18159,100079,308703), (308703, 19622, 54477), (72636,18159,
> 89366)..........)
>
> then you can use a recursive function to compare each element such as
>
> def iterate(lst:List[(Int, Int, Int)]):T = {
>     if (lst.isEmpty): /// return your comparison
>     else {
>          val splits = lst.splitAt(5)
>          // do sometjhing about it using splits._1
>          iterate(splits._2)
>    }
>
> will this help? or am i still missing something?
>
> kr
>
>
>
>
>
>
>
>
>
>
>
>
> On 24 Jul 2016 5:52 pm, "janardhan shetty" <ja...@gmail.com> wrote:
>
>> Array(
>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
>> 45431, 100136)),
>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
>> 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703, 160992,
>> 45431, 162076)),
>> (ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866, 44683,
>> 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431, 36318,
>> 162076))
>> )
>>
>> I need to compare first 5 elements of ID1 with first five element of ID3
>> next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
>> order until the end of number of elements.
>> Let me know if this helps
>>
>>
>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mm...@gmail.com>
>> wrote:
>>
>>> Apologies I misinterpreted.... could you post two use cases?
>>> Kr
>>>
>>> On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com>
>>> wrote:
>>>
>>>> Marco,
>>>>
>>>> Thanks for the response. It is indexed order and not ascending or
>>>> descending order.
>>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com> wrote:
>>>>
>>>>> Use map values to transform to an rdd where values are sorted?
>>>>> Hth
>>>>>
>>>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have a key,value pair rdd where value is an array of Ints. I need
>>>>>> to maintain the order of the value in order to execute downstream
>>>>>> modifications. How do we maintain the order of values?
>>>>>> Ex:
>>>>>> rdd = (id1,[5,2,3,15],
>>>>>> Id2,[9,4,2,5]....)
>>>>>>
>>>>>> Followup question how do we compare between one element in rdd with
>>>>>> all other elements ?
>>>>>>
>>>>>
>>

Re: Maintaining order of pair rdd

Posted by Marco Mistroni <mm...@gmail.com>.
Hello
 Uhm you have an array containing 3 tuples?
If all the arrays have same length, you can just zip all of them, creatings
a list of tuples
then you can scan the list 5 by 5...?

so something like

(Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList

this will give you a list of tuples of 3 elements containing each items
from ID1, ID2 and ID3  ... sample below
res: List((18159,100079,308703), (308703, 19622, 54477), (72636,18159,
89366)..........)

then you can use a recursive function to compare each element such as

def iterate(lst:List[(Int, Int, Int)]):T = {
    if (lst.isEmpty): /// return your comparison
    else {
         val splits = lst.splitAt(5)
         // do sometjhing about it using splits._1
         iterate(splits._2)
   }

will this help? or am i still missing something?

kr











On 24 Jul 2016 5:52 pm, "janardhan shetty" <ja...@gmail.com> wrote:

> Array(
> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
> 45431, 100136)),
> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
> 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703, 160992,
> 45431, 162076)),
> (ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866, 44683,
> 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431, 36318,
> 162076))
> )
>
> I need to compare first 5 elements of ID1 with first five element of ID3
> next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
> order until the end of number of elements.
> Let me know if this helps
>
>
> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Apologies I misinterpreted.... could you post two use cases?
>> Kr
>>
>> On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com>
>> wrote:
>>
>>> Marco,
>>>
>>> Thanks for the response. It is indexed order and not ascending or
>>> descending order.
>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com> wrote:
>>>
>>>> Use map values to transform to an rdd where values are sorted?
>>>> Hth
>>>>
>>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have a key,value pair rdd where value is an array of Ints. I need to
>>>>> maintain the order of the value in order to execute downstream
>>>>> modifications. How do we maintain the order of values?
>>>>> Ex:
>>>>> rdd = (id1,[5,2,3,15],
>>>>> Id2,[9,4,2,5]....)
>>>>>
>>>>> Followup question how do we compare between one element in rdd with
>>>>> all other elements ?
>>>>>
>>>>
>

Re: Maintaining order of pair rdd

Posted by janardhan shetty <ja...@gmail.com>.
Array(
(ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
45431, 100136)),
(ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703, 160992,
45431, 162076)),
(ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866, 44683,
19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431, 36318,
162076))
)

I need to compare first 5 elements of ID1 with first five element of ID3
next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
order until the end of number of elements.
Let me know if this helps


On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mm...@gmail.com> wrote:

> Apologies I misinterpreted.... could you post two use cases?
> Kr
>
> On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com> wrote:
>
>> Marco,
>>
>> Thanks for the response. It is indexed order and not ascending or
>> descending order.
>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com> wrote:
>>
>>> Use map values to transform to an rdd where values are sorted?
>>> Hth
>>>
>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com>
>>> wrote:
>>>
>>>> I have a key,value pair rdd where value is an array of Ints. I need to
>>>> maintain the order of the value in order to execute downstream
>>>> modifications. How do we maintain the order of values?
>>>> Ex:
>>>> rdd = (id1,[5,2,3,15],
>>>> Id2,[9,4,2,5]....)
>>>>
>>>> Followup question how do we compare between one element in rdd with all
>>>> other elements ?
>>>>
>>>

Re: Maintaining order of pair rdd

Posted by Marco Mistroni <mm...@gmail.com>.
Apologies I misinterpreted.... could you post two use cases?
Kr

On 24 Jul 2016 3:41 pm, "janardhan shetty" <ja...@gmail.com> wrote:

> Marco,
>
> Thanks for the response. It is indexed order and not ascending or
> descending order.
> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com> wrote:
>
>> Use map values to transform to an rdd where values are sorted?
>> Hth
>>
>> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com>
>> wrote:
>>
>>> I have a key,value pair rdd where value is an array of Ints. I need to
>>> maintain the order of the value in order to execute downstream
>>> modifications. How do we maintain the order of values?
>>> Ex:
>>> rdd = (id1,[5,2,3,15],
>>> Id2,[9,4,2,5]....)
>>>
>>> Followup question how do we compare between one element in rdd with all
>>> other elements ?
>>>
>>

Re: Maintaining order of pair rdd

Posted by janardhan shetty <ja...@gmail.com>.
Marco,

Thanks for the response. It is indexed order and not ascending or
descending order.
On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mm...@gmail.com> wrote:

> Use map values to transform to an rdd where values are sorted?
> Hth
>
> On 24 Jul 2016 6:23 am, "janardhan shetty" <ja...@gmail.com> wrote:
>
>> I have a key,value pair rdd where value is an array of Ints. I need to
>> maintain the order of the value in order to execute downstream
>> modifications. How do we maintain the order of values?
>> Ex:
>> rdd = (id1,[5,2,3,15],
>> Id2,[9,4,2,5]....)
>>
>> Followup question how do we compare between one element in rdd with all
>> other elements ?
>>
>