You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2015/06/01 06:57:01 UTC

Re: Why is RDD to PairRDDFunctions only via implicits?

dropping user list, adding dev.


Thanks, Justin, for the poc. This is a good idea to explore, especially for
Spark 2.0.


On Fri, May 22, 2015 at 12:08 PM, Justin Pihony <ju...@gmail.com>
wrote:

> The (crude) proof of concept seems to work:
>
> class RDD[V](value: List[V]){
>   def doStuff = println("I'm doing stuff")
> }
>
> object RDD{
>   implicit def toPair[V](x:RDD[V]) = new PairRDD(List((1,2)))
> }
>
> class PairRDD[K,V](value: List[(K,V)]) extends RDD (value){
>   def doPairs = println("I'm using pairs")
> }
>
> class Context{
>   def parallelize[K,V](x: List[(K,V)]) = new PairRDD(x)
>   def parallelize[V](x: List[V]) = new RDD(x)
> }
>
> On Fri, May 22, 2015 at 2:44 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> I'm not sure if it is possible to overload the map function twice, once
>> for just KV pairs, and another for K and V separately.
>>
>>
>> On Fri, May 22, 2015 at 10:26 AM, Justin Pihony <ju...@gmail.com>
>> wrote:
>>
>>> This ticket <https://issues.apache.org/jira/browse/SPARK-4397> improved
>>> the RDD API, but it could be even more discoverable if made available via
>>> the API directly. I assume this was originally an omission that now needs
>>> to be kept for backwards compatibility, but would any of the repo owners be
>>> open to making this more discoverable to the point of API docs and tab
>>> completion (while keeping both binary and source compatibility)?
>>>
>>>
>>>     class PairRDD extends RDD{
>>>       ....pair methods
>>>     }
>>>
>>>     RDD{
>>>       def map[K: ClassTag, V: ClassTag](f: T => (K,V)):PairRDD[K,V]
>>>     }
>>>
>>> As long as the implicits remain, then compatibility remains, but now it
>>> is explicit in the docs on how to get a PairRDD and in tab completion.
>>>
>>> Thoughts?
>>>
>>> Justin Pihony
>>>
>>
>>
>