You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Shixiong Zhu <zs...@gmail.com> on 2014/11/06 12:12:37 UTC

About implicit rddToPairRDDFunctions

I saw many people asked how to convert a RDD to a PairRDDFunctions. I would
like to ask a question about it. Why not put the following implicit into
"pacakge object rdd" or "object rdd"?

  implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
      (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null)
= {
    new PairRDDFunctions(rdd)
  }

If so, the converting will be automatic and not need to
import org.apache.spark.SparkContext._

I tried to search some discussion but found nothing.

Best Regards,
Shixiong Zhu

Re: About implicit rddToPairRDDFunctions

Posted by Shixiong Zhu <zs...@gmail.com>.

OK. I'll take it.

Best Regards,
Shixiong Zhu

2014-11-14 12:34 GMT+08:00 Reynold Xin <rx...@databricks.com>:

> That seems like a great idea. Can you submit a pull request?
>
>
> On Thu, Nov 13, 2014 at 7:13 PM, Shixiong Zhu <zs...@gmail.com> wrote:
>
>> If we put the `implicit` into "pacakge object rdd" or "object rdd", when
>> we write `rdd.groupbykey()`, because rdd is an object of RDD, Scala
>> compiler will search `object rdd`(companion object) and `package object rdd`(pacakge
>> object) by default. We don't need to import them explicitly. Here is a
>> post about the implicit search logic:
>> http://eed3si9n.com/revisiting-implicits-without-import-tax
>>
>> To maintain the compatibility, we can keep `rddToPairRDDFunctions` in
>> the SparkContext but remove `implicit`. The disadvantage is there are
>> two copies of same codes.
>>
>>
>>
>>
>> Best Regards,
>> Shixiong Zhu
>>
>> 2014-11-14 3:57 GMT+08:00 Reynold Xin <rx...@databricks.com>:
>>
>>> Do people usually important o.a.spark.rdd._ ?
>>>
>>> Also in order to maintain source and binary compatibility, we would need
>>> to keep both right?
>>>
>>>
>>> On Thu, Nov 6, 2014 at 3:12 AM, Shixiong Zhu <zs...@gmail.com> wrote:
>>>
>>>> I saw many people asked how to convert a RDD to a PairRDDFunctions. I
>>>> would
>>>> like to ask a question about it. Why not put the following implicit into
>>>> "pacakge object rdd" or "object rdd"?
>>>>
>>>>   implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
>>>>       (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] =
>>>> null)
>>>> = {
>>>>     new PairRDDFunctions(rdd)
>>>>   }
>>>>
>>>> If so, the converting will be automatic and not need to
>>>> import org.apache.spark.SparkContext._
>>>>
>>>> I tried to search some discussion but found nothing.
>>>>
>>>> Best Regards,
>>>> Shixiong Zhu
>>>>
>>>
>>>
>>
>

Re: About implicit rddToPairRDDFunctions

Posted by Reynold Xin <rx...@databricks.com>.

That seems like a great idea. Can you submit a pull request?


On Thu, Nov 13, 2014 at 7:13 PM, Shixiong Zhu <zs...@gmail.com> wrote:

> If we put the `implicit` into "pacakge object rdd" or "object rdd", when
> we write `rdd.groupbykey()`, because rdd is an object of RDD, Scala
> compiler will search `object rdd`(companion object) and `package object rdd`(pacakge
> object) by default. We don't need to import them explicitly. Here is a
> post about the implicit search logic:
> http://eed3si9n.com/revisiting-implicits-without-import-tax
>
> To maintain the compatibility, we can keep `rddToPairRDDFunctions` in the
> SparkContext but remove `implicit`. The disadvantage is there are two
> copies of same codes.
>
>
>
>
> Best Regards,
> Shixiong Zhu
>
> 2014-11-14 3:57 GMT+08:00 Reynold Xin <rx...@databricks.com>:
>
>> Do people usually important o.a.spark.rdd._ ?
>>
>> Also in order to maintain source and binary compatibility, we would need
>> to keep both right?
>>
>>
>> On Thu, Nov 6, 2014 at 3:12 AM, Shixiong Zhu <zs...@gmail.com> wrote:
>>
>>> I saw many people asked how to convert a RDD to a PairRDDFunctions. I
>>> would
>>> like to ask a question about it. Why not put the following implicit into
>>> "pacakge object rdd" or "object rdd"?
>>>
>>>   implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
>>>       (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] =
>>> null)
>>> = {
>>>     new PairRDDFunctions(rdd)
>>>   }
>>>
>>> If so, the converting will be automatic and not need to
>>> import org.apache.spark.SparkContext._
>>>
>>> I tried to search some discussion but found nothing.
>>>
>>> Best Regards,
>>> Shixiong Zhu
>>>
>>
>>
>

Re: About implicit rddToPairRDDFunctions

Posted by Shixiong Zhu <zs...@gmail.com>.

If we put the `implicit` into "pacakge object rdd" or "object rdd", when we
write `rdd.groupbykey()`, because rdd is an object of RDD, Scala compiler
will search `object rdd`(companion object) and `package object rdd`(pacakge
object) by default. We don't need to import them explicitly. Here is a post
about the implicit search logic:
http://eed3si9n.com/revisiting-implicits-without-import-tax

To maintain the compatibility, we can keep `rddToPairRDDFunctions` in the
SparkContext but remove `implicit`. The disadvantage is there are two
copies of same codes.

Best Regards,
Shixiong Zhu

2014-11-14 3:57 GMT+08:00 Reynold Xin <rx...@databricks.com>:

> Do people usually important o.a.spark.rdd._ ?
>
> Also in order to maintain source and binary compatibility, we would need
> to keep both right?
>
>
> On Thu, Nov 6, 2014 at 3:12 AM, Shixiong Zhu <zs...@gmail.com> wrote:
>
>> I saw many people asked how to convert a RDD to a PairRDDFunctions. I
>> would
>> like to ask a question about it. Why not put the following implicit into
>> "pacakge object rdd" or "object rdd"?
>>
>>   implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
>>       (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null)
>> = {
>>     new PairRDDFunctions(rdd)
>>   }
>>
>> If so, the converting will be automatic and not need to
>> import org.apache.spark.SparkContext._
>>
>> I tried to search some discussion but found nothing.
>>
>> Best Regards,
>> Shixiong Zhu
>>
>
>

Re: About implicit rddToPairRDDFunctions

Posted by Reynold Xin <rx...@databricks.com>.

Do people usually important o.a.spark.rdd._ ?

Also in order to maintain source and binary compatibility, we would need to
keep both right?


On Thu, Nov 6, 2014 at 3:12 AM, Shixiong Zhu <zs...@gmail.com> wrote:

> I saw many people asked how to convert a RDD to a PairRDDFunctions. I would
> like to ask a question about it. Why not put the following implicit into
> "pacakge object rdd" or "object rdd"?
>
>   implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
>       (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null)
> = {
>     new PairRDDFunctions(rdd)
>   }
>
> If so, the converting will be automatic and not need to
> import org.apache.spark.SparkContext._
>
> I tried to search some discussion but found nothing.
>
> Best Regards,
> Shixiong Zhu
>