You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Yifan LI <ia...@gmail.com> on 2014/09/15 16:25:04 UTC

vertex active/inactive feature in Pregel API ?

Hi,

I am wondering if the vertex active/inactive(corresponding the change of its value between two supersteps) feature is introduced in Pregel API of GraphX?

if it is not a default setting, how to call it below? 
  def sendMessage(edge: EdgeTriplet[(Int,HashMap[VertexId, Double]), Int]) =
    Iterator((edge.dstId, hmCal(edge.srcAttr)))

or, I should do that by a customised measure function, e.g. by keeping its change in vertex attribute after each iteration.


I noticed that there is an optional parameter “skipStale" in mrTriplets operator.


Best,
Yifan LI

Re: vertex active/inactive feature in Pregel API ?

Posted by Ankur Dave <an...@gmail.com>.

At 2014-09-16 12:23:10 +0200, Yifan LI <ia...@gmail.com> wrote:
> but I am wondering if there is a message(none?) sent to the target vertex(the rank change is less than tolerance) in below dynamic page rank implementation,
>
>  def sendMessage(edge: EdgeTriplet[(Double, Double), Double]) = {
>       if (edge.srcAttr._2 > tol) {
>         Iterator((edge.dstId, edge.srcAttr._2 * edge.attr))
>       } else {
>         Iterator.empty
>       }
>     }
>
> so, in this case, there is a message, even is none, is still sent? or not?

No, in that case no message is sent, and if all in-edges of a particular vertex return Iterator.empty, then the vertex will become inactive in the next iteration.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: vertex active/inactive feature in Pregel API ?

Posted by Yifan LI <ia...@gmail.com>.

Thanks, :)

but I am wondering if there is a message(none?) sent to the target vertex(the rank change is less than tolerance) in below dynamic page rank implementation,

 def sendMessage(edge: EdgeTriplet[(Double, Double), Double]) = {
      if (edge.srcAttr._2 > tol) {
        Iterator((edge.dstId, edge.srcAttr._2 * edge.attr))
      } else {
        Iterator.empty
      }
    }

so, in this case, there is a message, even is none, is still sent? or not?


Best,
Yifan

On 16 Sep 2014, at 11:48, Ankur Dave <an...@gmail.com> wrote:

> At 2014-09-16 10:55:37 +0200, Yifan LI <ia...@gmail.com> wrote:
>> - from [1], and my understanding, the existing inactive feature in graphx pregel api is “if there is no in-edges, from active vertex, to this vertex, then we will say this one is inactive”, right?
> 
> Well, that's true when messages are only sent forward along edges (from the source to the destination) and the activeDirection is EdgeDirection.Out. If both of these conditions are true, then a vertex without in-edges cannot receive a message, and therefore its vertex program will never run and a message will never be sent along its out-edges. PageRank is an application that satisfies both the conditions.
> 
>> For instance, there is a graph in which every vertex has at least one in-edges, then we run static Pagerank on it for 10 iterations. During this calculation, is there any vertex would be set inactive?
> 
> No: since every vertex always sends a message in static PageRank, if every vertex has an in-edge, it will always receive a message and will remain active.
> 
> In fact, this is why I recently rewrote static PageRank not to use Pregel [3]. Assuming that most vertices do have in-edges, it's unnecessary to track active vertices, which can provide a big savings.
> 
>> - for more “explicit active vertex tracking”, e.g. vote to halt, how to achieve it in existing api?
>> (I am not sure I got the point of [2], that “vote” function has already been introduced in graphx pregel api? )
> 
> The current Pregel API effectively makes every vertex vote to halt in every superstep. Therefore only vertices that receive messages get awoken in the next superstep.
> 
> Instead, [2] proposes to make every vertex run by default in every superstep unless it votes to halt *and* receives no messages. This allows a vertex to have more control over whether or not it will run, rather than leaving that entirely up to its neighbors.
> 
> Ankur
> 
>>> [1] http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Pregel$
>>> [2] https://github.com/apache/spark/pull/1217
> [3] https://github.com/apache/spark/pull/2308

Re: vertex active/inactive feature in Pregel API ?

Posted by Ankur Dave <an...@gmail.com>.

At 2014-09-16 10:55:37 +0200, Yifan LI <ia...@gmail.com> wrote:
> - from [1], and my understanding, the existing inactive feature in graphx pregel api is “if there is no in-edges, from active vertex, to this vertex, then we will say this one is inactive”, right?

Well, that's true when messages are only sent forward along edges (from the source to the destination) and the activeDirection is EdgeDirection.Out. If both of these conditions are true, then a vertex without in-edges cannot receive a message, and therefore its vertex program will never run and a message will never be sent along its out-edges. PageRank is an application that satisfies both the conditions.

> For instance, there is a graph in which every vertex has at least one in-edges, then we run static Pagerank on it for 10 iterations. During this calculation, is there any vertex would be set inactive?

No: since every vertex always sends a message in static PageRank, if every vertex has an in-edge, it will always receive a message and will remain active.

In fact, this is why I recently rewrote static PageRank not to use Pregel [3]. Assuming that most vertices do have in-edges, it's unnecessary to track active vertices, which can provide a big savings.

> - for more “explicit active vertex tracking”, e.g. vote to halt, how to achieve it in existing api?
> (I am not sure I got the point of [2], that “vote” function has already been introduced in graphx pregel api? )

The current Pregel API effectively makes every vertex vote to halt in every superstep. Therefore only vertices that receive messages get awoken in the next superstep.

Instead, [2] proposes to make every vertex run by default in every superstep unless it votes to halt *and* receives no messages. This allows a vertex to have more control over whether or not it will run, rather than leaving that entirely up to its neighbors.

Ankur

>> [1] http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Pregel$
>> [2] https://github.com/apache/spark/pull/1217
[3] https://github.com/apache/spark/pull/2308

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: vertex active/inactive feature in Pregel API ?

Posted by Yifan LI <ia...@gmail.com>.

Dear Ankur,

Thanks! :)

- from [1], and my understanding, the existing inactive feature in graphx pregel api is “if there is no in-edges, from active vertex, to this vertex, then we will say this one is inactive”, right?

For instance, there is a graph in which every vertex has at least one in-edges, then we run static Pagerank on it for 10 iterations. During this calculation, is there any vertex would be set inactive?


- for more “explicit active vertex tracking”, e.g. vote to halt, how to achieve it in existing api?
(I am not sure I got the point of [2], that “vote” function has already been introduced in graphx pregel api? )


Best,
Yifan LI

On 15 Sep 2014, at 23:07, Ankur Dave <an...@gmail.com> wrote:

> At 2014-09-15 16:25:04 +0200, Yifan LI <ia...@gmail.com> wrote:
>> I am wondering if the vertex active/inactive(corresponding the change of its value between two supersteps) feature is introduced in Pregel API of GraphX?
> 
> Vertex activeness in Pregel is controlled by messages: if a vertex did not receive a message in the previous iteration, its vertex program will not run in the current iteration. Also, inactive vertices will not be able to send messages because by default the sendMsg function will only be run on edges where at least one of the adjacent vertices received a message. You can change this behavior -- see the documentation for the activeDirection parameter to Pregel.apply [1].
> 
> There is also an open pull request to make active vertex tracking more explicit by allowing vertices to vote to halt directly [2].
> 
> Ankur
> 
> [1] http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Pregel$
> [2] https://github.com/apache/spark/pull/1217

Re: vertex active/inactive feature in Pregel API ?

Posted by Ankur Dave <an...@gmail.com>.

At 2014-09-15 16:25:04 +0200, Yifan LI <ia...@gmail.com> wrote:
> I am wondering if the vertex active/inactive(corresponding the change of its value between two supersteps) feature is introduced in Pregel API of GraphX?

Vertex activeness in Pregel is controlled by messages: if a vertex did not receive a message in the previous iteration, its vertex program will not run in the current iteration. Also, inactive vertices will not be able to send messages because by default the sendMsg function will only be run on edges where at least one of the adjacent vertices received a message. You can change this behavior -- see the documentation for the activeDirection parameter to Pregel.apply [1].

There is also an open pull request to make active vertex tracking more explicit by allowing vertices to vote to halt directly [2].

Ankur

[1] http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Pregel$
[2] https://github.com/apache/spark/pull/1217

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org