You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Deep Pradhan <pr...@gmail.com> on 2014/12/03 07:01:20 UTC

Filter using the Vertex Ids

Hi,
I have a graph which returns the following on doing graph.vertices
(1, 1.0)
(2, 1.0)
(3, 2.0)
(4, 2.0)
(5, 0.0)

Here 5 is the root node of the graph. This is a VertexRDD. I want to group
all the vertices with the same attribute together, like into one RDD or
something. I want all the vertices with same attribute to be together.
How should I go about doing it?
I tried creating RDDs of vertices for each attribute value but I realized
that I will end up creating too many RDDs if the graph is huge and the
distances are varied.
Can anyone suggest to me as to how I should go about doing this?

Thank You

Re: Filter using the Vertex Ids

Posted by Ankur Dave <an...@gmail.com>.
To get that function in scope you have to import
org.apache.spark.SparkContext._

Ankur

On Wednesday, December 3, 2014, Deep Pradhan <pr...@gmail.com>
wrote:

> But groupByKey() gives me the error saying that it is not a member of
> org.apache.spark,rdd,RDD[(Double, org.apache.spark.graphx.VertexId)]
>


-- 
Ankur <http://www.ankurdave.com/>

Re: Filter using the Vertex Ids

Posted by Deep Pradhan <pr...@gmail.com>.
But groupByKey() gives me the error saying that it is not a member of
org.apache.spark,rdd,RDD[(Double, org.apache.spark.graphx.VertexId)] when
run in the graphx directory of spark-1.0.0. This error does not come when I
use the same in the interactive shell.


On Wed, Dec 3, 2014 at 3:49 PM, Ankur Dave <an...@gmail.com> wrote:

> At 2014-12-03 02:13:49 -0800, Deep Pradhan <pr...@gmail.com>
> wrote:
> > We cannot do sc.parallelize(List(VertexRDD)), can we?
>
> There's no need to do this, because every VertexRDD is also a pair RDD:
>
>     class VertexRDD[VD] extends RDD[(VertexId, VD)]
>
> You can simply use graph.vertices in place of `a` in my example.
>
> Ankur
>

Re: Filter using the Vertex Ids

Posted by Ankur Dave <an...@gmail.com>.
At 2014-12-03 02:13:49 -0800, Deep Pradhan <pr...@gmail.com> wrote:
> We cannot do sc.parallelize(List(VertexRDD)), can we?

There's no need to do this, because every VertexRDD is also a pair RDD:

    class VertexRDD[VD] extends RDD[(VertexId, VD)]

You can simply use graph.vertices in place of `a` in my example.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Filter using the Vertex Ids

Posted by Deep Pradhan <pr...@gmail.com>.
And one more thing, the given tupes
(1, 1.0)
(2, 1.0)
(3, 2.0)
(4, 2.0)
(5, 0.0)

are a part of RDD and they are not just tuples.
graph.vertices return me the above tuples which is a part of VertexRDD.


On Wed, Dec 3, 2014 at 3:43 PM, Deep Pradhan <pr...@gmail.com>
wrote:

> This is just an example but if my graph is big, there will be so many
> tuples to handle. I cannot manually do
> val a: RDD[(Int, Double)] = sc.parallelize(List(
>       (1, 1.0),
>       (2, 1.0),
>       (3, 2.0),
>       (4, 2.0),
>       (5, 0.0)))
> for all the vertices in the graph.
> What should I do in that case?
> We cannot do *sc.parallelize(List(VertexRDD)), *can we?
>
> On Wed, Dec 3, 2014 at 3:32 PM, Ankur Dave <an...@gmail.com> wrote:
>
>> At 2014-12-02 22:01:20 -0800, Deep Pradhan <pr...@gmail.com>
>> wrote:
>> > I have a graph which returns the following on doing graph.vertices
>> > (1, 1.0)
>> > (2, 1.0)
>> > (3, 2.0)
>> > (4, 2.0)
>> > (5, 0.0)
>> >
>> > I want to group all the vertices with the same attribute together, like
>> into
>> > one RDD or something. I want all the vertices with same attribute to be
>> > together.
>>
>> You can do this by flipping the tuples so the values become the keys,
>> then using one of the by-key functions in PairRDDFunctions:
>>
>>     val a: RDD[(Int, Double)] = sc.parallelize(List(
>>       (1, 1.0),
>>       (2, 1.0),
>>       (3, 2.0),
>>       (4, 2.0),
>>       (5, 0.0)))
>>
>>     val b: RDD[(Double, Int)] = a.map(kv => (kv._2, kv._1))
>>
>>     val c: RDD[(Double, Iterable[Int])] = b.groupByKey(numPartitions = 5)
>>
>>     c.collect.foreach(println)
>>     // (0.0,CompactBuffer(5))
>>     // (1.0,CompactBuffer(1, 2))
>>     // (2.0,CompactBuffer(3, 4))
>>
>> Ankur
>>
>
>

Re: Filter using the Vertex Ids

Posted by Deep Pradhan <pr...@gmail.com>.
This is just an example but if my graph is big, there will be so many
tuples to handle. I cannot manually do
val a: RDD[(Int, Double)] = sc.parallelize(List(
      (1, 1.0),
      (2, 1.0),
      (3, 2.0),
      (4, 2.0),
      (5, 0.0)))
for all the vertices in the graph.
What should I do in that case?
We cannot do *sc.parallelize(List(VertexRDD)), *can we?

On Wed, Dec 3, 2014 at 3:32 PM, Ankur Dave <an...@gmail.com> wrote:

> At 2014-12-02 22:01:20 -0800, Deep Pradhan <pr...@gmail.com>
> wrote:
> > I have a graph which returns the following on doing graph.vertices
> > (1, 1.0)
> > (2, 1.0)
> > (3, 2.0)
> > (4, 2.0)
> > (5, 0.0)
> >
> > I want to group all the vertices with the same attribute together, like
> into
> > one RDD or something. I want all the vertices with same attribute to be
> > together.
>
> You can do this by flipping the tuples so the values become the keys, then
> using one of the by-key functions in PairRDDFunctions:
>
>     val a: RDD[(Int, Double)] = sc.parallelize(List(
>       (1, 1.0),
>       (2, 1.0),
>       (3, 2.0),
>       (4, 2.0),
>       (5, 0.0)))
>
>     val b: RDD[(Double, Int)] = a.map(kv => (kv._2, kv._1))
>
>     val c: RDD[(Double, Iterable[Int])] = b.groupByKey(numPartitions = 5)
>
>     c.collect.foreach(println)
>     // (0.0,CompactBuffer(5))
>     // (1.0,CompactBuffer(1, 2))
>     // (2.0,CompactBuffer(3, 4))
>
> Ankur
>

Re: Filter using the Vertex Ids

Posted by Ankur Dave <an...@gmail.com>.
At 2014-12-02 22:01:20 -0800, Deep Pradhan <pr...@gmail.com> wrote:
> I have a graph which returns the following on doing graph.vertices
> (1, 1.0)
> (2, 1.0)
> (3, 2.0)
> (4, 2.0)
> (5, 0.0)
>
> I want to group all the vertices with the same attribute together, like into
> one RDD or something. I want all the vertices with same attribute to be
> together.

You can do this by flipping the tuples so the values become the keys, then using one of the by-key functions in PairRDDFunctions:

    val a: RDD[(Int, Double)] = sc.parallelize(List(
      (1, 1.0),
      (2, 1.0),
      (3, 2.0),
      (4, 2.0),
      (5, 0.0)))
    
    val b: RDD[(Double, Int)] = a.map(kv => (kv._2, kv._1))
    
    val c: RDD[(Double, Iterable[Int])] = b.groupByKey(numPartitions = 5)
    
    c.collect.foreach(println)
    // (0.0,CompactBuffer(5))
    // (1.0,CompactBuffer(1, 2))
    // (2.0,CompactBuffer(3, 4))

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org