You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jluan <ja...@gmail.com> on 2016/02/23 02:15:53 UTC

Force Partitioner to use entire entry of PairRDD as key

I was wondering, is there a way to force something like the hash partitioner
to use the entire entry of a PairRDD as a hash rather than just the key?

For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
partitioner to hash the entire tuple such as (1,4)?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Force Partitioner to use entire entry of PairRDD as key

Posted by Jay Luan <ja...@gmail.com>.
Thank you, that helps a lot.

On Mon, Feb 22, 2016 at 6:01 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:

> You're correct, reduceByKey is just an example.
>
> On Tue, Feb 23, 2016 at 10:57 AM, Jay Luan <ja...@gmail.com> wrote:
>
>> Could you elaborate on how this would work?
>>
>> So from what I can tell, this maps a key to a tuple which always has a 0
>> as the second element. From there the hash widely changes because we now
>> hash something like ((1,4), 0) and ((1,3), 0). Thus mapping this would
>> create more even partitions. Why reduce by key after? Is that just an
>> example of an operation that can be done? Or does it provide some kind of
>> real value to the operation.
>>
>>
>>
>> On Mon, Feb 22, 2016 at 5:48 PM, Takeshi Yamamuro <li...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> How about adding dummy values?
>>> values.map(d => (d, 0)).reduceByKey(_ + _)
>>>
>>> On Tue, Feb 23, 2016 at 10:15 AM, jluan <ja...@gmail.com> wrote:
>>>
>>>> I was wondering, is there a way to force something like the hash
>>>> partitioner
>>>> to use the entire entry of a PairRDD as a hash rather than just the key?
>>>>
>>>> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3),
>>>> (2,
>>>> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
>>>> partitioner to hash the entire tuple such as (1,4)?
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Force Partitioner to use entire entry of PairRDD as key

Posted by Takeshi Yamamuro <li...@gmail.com>.
You're correct, reduceByKey is just an example.

On Tue, Feb 23, 2016 at 10:57 AM, Jay Luan <ja...@gmail.com> wrote:

> Could you elaborate on how this would work?
>
> So from what I can tell, this maps a key to a tuple which always has a 0
> as the second element. From there the hash widely changes because we now
> hash something like ((1,4), 0) and ((1,3), 0). Thus mapping this would
> create more even partitions. Why reduce by key after? Is that just an
> example of an operation that can be done? Or does it provide some kind of
> real value to the operation.
>
>
>
> On Mon, Feb 22, 2016 at 5:48 PM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Hi,
>>
>> How about adding dummy values?
>> values.map(d => (d, 0)).reduceByKey(_ + _)
>>
>> On Tue, Feb 23, 2016 at 10:15 AM, jluan <ja...@gmail.com> wrote:
>>
>>> I was wondering, is there a way to force something like the hash
>>> partitioner
>>> to use the entire entry of a PairRDD as a hash rather than just the key?
>>>
>>> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
>>> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
>>> partitioner to hash the entire tuple such as (1,4)?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Re: Force Partitioner to use entire entry of PairRDD as key

Posted by Jay Luan <ja...@gmail.com>.
Could you elaborate on how this would work?

So from what I can tell, this maps a key to a tuple which always has a 0 as
the second element. From there the hash widely changes because we now hash
something like ((1,4), 0) and ((1,3), 0). Thus mapping this would create
more even partitions. Why reduce by key after? Is that just an example of
an operation that can be done? Or does it provide some kind of real value
to the operation.



On Mon, Feb 22, 2016 at 5:48 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:

> Hi,
>
> How about adding dummy values?
> values.map(d => (d, 0)).reduceByKey(_ + _)
>
> On Tue, Feb 23, 2016 at 10:15 AM, jluan <ja...@gmail.com> wrote:
>
>> I was wondering, is there a way to force something like the hash
>> partitioner
>> to use the entire entry of a PairRDD as a hash rather than just the key?
>>
>> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
>> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
>> partitioner to hash the entire tuple such as (1,4)?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Force Partitioner to use entire entry of PairRDD as key

Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,

How about adding dummy values?
values.map(d => (d, 0)).reduceByKey(_ + _)

On Tue, Feb 23, 2016 at 10:15 AM, jluan <ja...@gmail.com> wrote:

> I was wondering, is there a way to force something like the hash
> partitioner
> to use the entire entry of a PairRDD as a hash rather than just the key?
>
> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
> partitioner to hash the entire tuple such as (1,4)?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro