You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jared Rodriguez <jr...@kitedesk.com> on 2014/04/23 16:48:26 UTC
Comparing RDD Items
Hi there,
I am new to Spark and new to scala, although have lots of experience on the
Java side. I am experimenting with Spark for a new project where it seems
like it could be a good fit. As I go through the examples, there is one
case scenario that I am trying to figure out, comparing the contents of an
RDD to itself to result in a new RDD.
In an overly simply example, I have:
JavaSparkContext sc = new JavaSparkContext ...
JavaRDD<String> data = sc.parallelize(buildData());
I then want to compare each entry in data to other entries and end up with:
JavaPairRDD<String, List<String>> mapped = data.???
Is this something easily handled by Spark? My apologies if this is a
stupid question, I have spent less than 10 hours tinkering with Spark and
am trying to come up to speed.
--
Jared Rodriguez
Re: Comparing RDD Items
Posted by Daniel Darabos <da...@lynxanalytics.com>.
Hi! There is RDD.cartesian(), which creates the Cartiesian product of two
RDDs. You could do data.cartesian(data) to get an RDD of all pairs of
lines. It will be of length data.count * data.count of course.
On Wed, Apr 23, 2014 at 4:48 PM, Jared Rodriguez <jr...@kitedesk.com>wrote:
> Hi there,
>
> I am new to Spark and new to scala, although have lots of experience on
> the Java side. I am experimenting with Spark for a new project where it
> seems like it could be a good fit. As I go through the examples, there is
> one case scenario that I am trying to figure out, comparing the contents of
> an RDD to itself to result in a new RDD.
>
> In an overly simply example, I have:
>
> JavaSparkContext sc = new JavaSparkContext ...
> JavaRDD<String> data = sc.parallelize(buildData());
>
> I then want to compare each entry in data to other entries and end up with:
>
> JavaPairRDD<String, List<String>> mapped = data.???
>
> Is this something easily handled by Spark? My apologies if this is a
> stupid question, I have spent less than 10 hours tinkering with Spark and
> am trying to come up to speed.
>
>
> --
> Jared Rodriguez
>
>