You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Philip Ogren <ph...@oracle.com> on 2014/07/21 19:01:58 UTC

relationship of RDD[Array[String]] to Array[Array[String]]

It is really nice that Spark RDD's provide functions  that are often 
equivalent to functions found in Scala collections.  For example, I can 
call:

myArray.map(myFx)

and equivalently

myRdd.map(myFx)

Awesome!

My question is this.  Is it possible to write code that works on either 
an RDD or a local collection without having to have parallel 
implementations?  I can't tell that RDD or Array share any supertypes or 
traits by looking at the respective scaladocs. Perhaps implicit 
conversions could be used here.  What I would like to do is have a 
single function whose body is like this:

myData.map(myFx)

where myData could be an RDD[Array[String]] (for example) or an 
Array[Array[String]].

Has anyone had success doing this?

Thanks,
Philip

Re: relationship of RDD[Array[String]] to Array[Array[String]]

Posted by Philip Ogren <ph...@oracle.com>.

Thanks Michael,

That is one solution that I had thought of.  It seems like a bit of 
overkill for the few methods I want to do this for - but I will think 
about it.  I guess I was hoping that I was missing something more 
obvious/easier.

Philip

On 07/21/2014 11:20 AM, Michael Malak wrote:
> It's really more of a Scala question than a Spark question, but the standard OO (not Scala-specific) way is to create your own custom supertype (e.g. MyCollectionTrait), inherited/implemented by two concrete classes (e.g. MyRDD and MyArray), each of which manually forwards method calls to the corresponding pre-existing library implementations. Writing all those forwarding method calls is tedious, but Scala provides at least one bit of syntactic sugar, which alleviates having to type in twice the parameter lists for each method:
> http://stackoverflow.com/questions/8230831/is-method-parameter-forwarding-possible-in-scala
>
> I'm not seeing a way to utilize implicit conversions in this case. Since Scala is statically (albeit inferred) typed, I don't see a way around having a common supertype.
>
>
>
> On Monday, July 21, 2014 11:01 AM, Philip Ogren <ph...@oracle.com> wrote:
> It is really nice that Spark RDD's provide functions  that are often
> equivalent to functions found in Scala collections.  For example, I can
> call:
>
> myArray.map(myFx)
>
> and equivalently
>
> myRdd.map(myFx)
>
> Awesome!
>
> My question is this.  Is it possible to write code that works on either
> an RDD or a local collection without having to have parallel
> implementations?  I can't tell that RDD or Array share any supertypes or
> traits by looking at the respective scaladocs. Perhaps implicit
> conversions could be used here.  What I would like to do is have a
> single function whose body is like this:
>
> myData.map(myFx)
>
> where myData could be an RDD[Array[String]] (for example) or an
> Array[Array[String]].
>
> Has anyone had success doing this?
>
> Thanks,
> Philip

Re: relationship of RDD[Array[String]] to Array[Array[String]]

Posted by Michael Malak <mi...@yahoo.com>.

It's really more of a Scala question than a Spark question, but the standard OO (not Scala-specific) way is to create your own custom supertype (e.g. MyCollectionTrait), inherited/implemented by two concrete classes (e.g. MyRDD and MyArray), each of which manually forwards method calls to the corresponding pre-existing library implementations. Writing all those forwarding method calls is tedious, but Scala provides at least one bit of syntactic sugar, which alleviates having to type in twice the parameter lists for each method:
http://stackoverflow.com/questions/8230831/is-method-parameter-forwarding-possible-in-scala

I'm not seeing a way to utilize implicit conversions in this case. Since Scala is statically (albeit inferred) typed, I don't see a way around having a common supertype.



On Monday, July 21, 2014 11:01 AM, Philip Ogren <ph...@oracle.com> wrote:
It is really nice that Spark RDD's provide functions  that are often 
equivalent to functions found in Scala collections.  For example, I can 
call:

myArray.map(myFx)

and equivalently

myRdd.map(myFx)

Awesome!

My question is this.  Is it possible to write code that works on either 
an RDD or a local collection without having to have parallel 
implementations?  I can't tell that RDD or Array share any supertypes or 
traits by looking at the respective scaladocs. Perhaps implicit 
conversions could be used here.  What I would like to do is have a 
single function whose body is like this:

myData.map(myFx)

where myData could be an RDD[Array[String]] (for example) or an 
Array[Array[String]].

Has anyone had success doing this?

Thanks,
Philip

Re: relationship of RDD[Array[String]] to Array[Array[String]]

Posted by andy petrella <an...@gmail.com>.

heya,

Without a bit of gymnastic at the type level, nope. Actually RDD doesn't
share any functions with the scala lib (the simple reason I could see is
that the Spark's ones are lazy, the default implementations in Scala
aren't).

However, it'd be possible by implementing an implicit converter from a
SeqLike (f.i.) to an RDD, nonetheless it'd be cumbersome because the
overlap between the two world isn't entire (for instance, flatMap haven't
the same semantic, drop is hard, etc).

Also, it'd scary me a bit to have this kind of bazooka waiting me a next
corner, by letting me think that a iterative like process can be ran in a
distributed world :-).

OTOH, the inverse is quite easy, an implicit conv from RDD to an Array is
simply a call to collect (take care that RDD is not covariant -- I think
it'd be related to the fact that the ClassTag is needed!?)

only my .2 ¢

 aℕdy ℙetrella
about.me/noootsab
[image: aℕdy ℙetrella on about.me]

<http://about.me/noootsab>

On Mon, Jul 21, 2014 at 7:01 PM, Philip Ogren <ph...@oracle.com>
wrote:

> It is really nice that Spark RDD's provide functions  that are often
> equivalent to functions found in Scala collections.  For example, I can
> call:
>
> myArray.map(myFx)
>
> and equivalently
>
> myRdd.map(myFx)
>
> Awesome!
>
> My question is this.  Is it possible to write code that works on either an
> RDD or a local collection without having to have parallel implementations?
>  I can't tell that RDD or Array share any supertypes or traits by looking
> at the respective scaladocs. Perhaps implicit conversions could be used
> here.  What I would like to do is have a single function whose body is like
> this:
>
> myData.map(myFx)
>
> where myData could be an RDD[Array[String]] (for example) or an
> Array[Array[String]].
>
> Has anyone had success doing this?
>
> Thanks,
> Philip
>
>
>