You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bin <wu...@126.com> on 2014/07/29 11:40:09 UTC

[GraphX] How to access a vertex via vertexId?

Hi All,


I wonder how to access a vertex via its vertexId? I need to get vertex's attributes after running graph algorithm.


Thanks very much!


Best,
Bin

Re: [GraphX] How to access a vertex via vertexId?

Posted by andy petrella <an...@gmail.com>.
'lookup' on RDD (pair) maybe?
Le 29 juil. 2014 12:04, "Yifan LI" <ia...@gmail.com> a écrit :

> Hi Bin,
>
> Maybe you could get the vertex, for instance, which id is 80, by using:
>
> *graph.vertices.filter{case(id, _) => id==80}.collect*
>
> but I am not sure this is the exactly efficient way.(it will scan the
> whole table? if it can not get benefit from index of VertexRDD table)
>
> @Ankur, is there any other better method?
>
>
>
> On Jul 29, 2014, at 11:40 AM, Bin <wu...@126.com> wrote:
>
> Hi All,
>
> I wonder how to access a vertex via its vertexId? I need to get vertex's
> attributes after running graph algorithm.
>
> Thanks very much!
>
> Best,
> Bin
>
>
>
>

Re: [GraphX] How to access a vertex via vertexId?

Posted by andy petrella <an...@gmail.com>.
👍thx!
Le 29 juil. 2014 22:09, "Ankur Dave" <an...@gmail.com> a écrit :

> andy petrella <an...@gmail.com> writes:
> > Oh I was almost sure that lookup was optimized using the partition info
>
> It does use the partitioner to run only one task, but within that task it
> has to scan the entire partition:
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L710
>
> Ankur
>

Re: [GraphX] How to access a vertex via vertexId?

Posted by Ankur Dave <an...@gmail.com>.
andy petrella <an...@gmail.com> writes:
> Oh I was almost sure that lookup was optimized using the partition info

It does use the partitioner to run only one task, but within that task it has to scan the entire partition:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L710

Ankur

Re: [GraphX] How to access a vertex via vertexId?

Posted by andy petrella <an...@gmail.com>.
Oh I was almost sure that lookup was optimized using the partition info
Le 29 juil. 2014 21:25, "Ankur Dave" <an...@gmail.com> a écrit :

> Yifan LI <ia...@gmail.com> writes:
> > Maybe you could get the vertex, for instance, which id is 80, by using:
> >
> > graph.vertices.filter{case(id, _) => id==80}.collect
> >
> > but I am not sure this is the exactly efficient way.(it will scan the
> whole table? if it can not get benefit from index of VertexRDD table)
>
> Until IndexedRDD is merged, a scan and collect is the best officially
> supported way. PairRDDFunctions.lookup does this under the hood as well.
>
> However, it's possible to use the VertexRDD's hash index to do a much more
> efficient lookup. Note that these APIs may change, since
> VertexPartitionBase and its subclasses are private[graphx].
>
> You can access the partitions of a VertexRDD using
> VertexRDD#partitionsRDD, and each partition has
> VertexPartitionBase#isDefined and VertexPartitionBase#apply. Putting it all
> together:
>
>     val verts: VertexRDD[_] = ...
>     val targetVid: VertexId = 80L
>     val result = verts.partitionsRDD.flatMap { part =>
>       if (part.isDefined(targetVid)) Some(part(targetVid))
>       else None
>     }.collect.head
>
> Once IndexedRDD [1] is merged, it will provide this functionality using
> verts.get(targetVid). Its implementation of get also uses the hash
> partitioner to run only one task [2].
>
> Ankur
>
> [1] https://issues.apache.org/jira/browse/SPARK-2365
> [2]
> https://github.com/ankurdave/spark/blob/IndexedRDD/core/src/main/scala/org/apache/spark/rdd/IndexedRDDLike.scala#L89
>

Re: [GraphX] How to access a vertex via vertexId?

Posted by Ankur Dave <an...@gmail.com>.
Yifan LI <ia...@gmail.com> writes:
> Maybe you could get the vertex, for instance, which id is 80, by using:
>
> graph.vertices.filter{case(id, _) => id==80}.collect
>
> but I am not sure this is the exactly efficient way.(it will scan the whole table? if it can not get benefit from index of VertexRDD table)

Until IndexedRDD is merged, a scan and collect is the best officially supported way. PairRDDFunctions.lookup does this under the hood as well.

However, it's possible to use the VertexRDD's hash index to do a much more efficient lookup. Note that these APIs may change, since VertexPartitionBase and its subclasses are private[graphx].

You can access the partitions of a VertexRDD using VertexRDD#partitionsRDD, and each partition has VertexPartitionBase#isDefined and VertexPartitionBase#apply. Putting it all together:

    val verts: VertexRDD[_] = ...
    val targetVid: VertexId = 80L
    val result = verts.partitionsRDD.flatMap { part =>
      if (part.isDefined(targetVid)) Some(part(targetVid))
      else None
    }.collect.head

Once IndexedRDD [1] is merged, it will provide this functionality using verts.get(targetVid). Its implementation of get also uses the hash partitioner to run only one task [2].

Ankur

[1] https://issues.apache.org/jira/browse/SPARK-2365
[2] https://github.com/ankurdave/spark/blob/IndexedRDD/core/src/main/scala/org/apache/spark/rdd/IndexedRDDLike.scala#L89

Re: [GraphX] How to access a vertex via vertexId?

Posted by Yifan LI <ia...@gmail.com>.
Hi Bin,

Maybe you could get the vertex, for instance, which id is 80, by using:

graph.vertices.filter{case(id, _) => id==80}.collect

but I am not sure this is the exactly efficient way.(it will scan the whole table? if it can not get benefit from index of VertexRDD table)

@Ankur, is there any other better method?



On Jul 29, 2014, at 11:40 AM, Bin <wu...@126.com> wrote:

> Hi All,
> 
> I wonder how to access a vertex via its vertexId? I need to get vertex's attributes after running graph algorithm.
> 
> Thanks very much!
> 
> Best,
> Bin
> 
>