You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhaokang Wang (JIRA)" <ji...@apache.org> on 2016/03/13 14:08:33 UTC

[jira] [Comment Edited] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge

    [ https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192332#comment-15192332 ] 

Zhaokang Wang edited comment on SPARK-6378 at 3/13/16 1:08 PM:
---------------------------------------------------------------

I have met a similar problem with triplets update in GraphX.
I think I have a code demo that can reproduce the situation of this issue.
I reproduce the issue on a small toy graph with only 3 vertices. My demo code has been attached as [^TripletsViewDonotUpdate.scala].

The key code is shown as the following:

{code}
    // purGraph is a toy graph with edges: 2->1, 3->1, 2->3. 
    val purGraph = Graph(dataVertex, dataEdge).persist() 
    purGraph.triplets.count() // this operation will cause the bug.
    val inNeighborGraph = purGraph.collectNeighbors(EdgeDirection.In) 
    val dataGraph = purGraph.outerJoinVertices(inNeighborGraph)((vid, property, inNeighborList) => {... }) 
    
    // dataGraph's vertices view and triplets view will be inconsistent on vertex 3's inNeighbor attribute. 
    dataGraph.vertices.foreach {...} 
    dataGraph.triplets.foreach {...}
{code}

We can see from the output that the two views are inconsistent on *vertex 3*'s {{inNeighbor}} property:

{quote}
----> dataGraph.vertices
vid: 1, inNeighbor:2,3
vid: 3, inNeighbor:2
vid: 2, inNeighbor:
----> dataGraph.triplets.srcAttr
vid: 2, inNeighbor:
vid: 2, inNeighbor:
vid: 3, inNeighbor:
{quote}

5. If we comment the {{purGraph.triplets.count()}} statement in the code, the bug will disappear:
{code}
    val purGraph = Graph(dataVertex, dataEdge).persist()
    // purGraph.triplets.count() // !!!comment this
    val inNeighborGraph = purGraph.collectNeighbors(EdgeDirection.In)
    // Now join the in neighbor vertex id list to every vertex's property
    val dataGraph = purGraph.outerJoinVertices(inNeighborGraph)((vid, property, inNeighborList) => {...})
{code}

It seems that the triplets view and the vertex view of the same graph may be inconsistent in some situation.


was (Author: bsidb):
I have met a similar problem with triplets update in GraphX.
I think I have a code demo that can reproduce the situation of this issue.
I reproduce the issue on a small toy graph with only 3 vertices. My demo code has been attached as [^TripletsViewDonotUpdate.scala].

Let me describe the steps to reproduce the issue:
1. We have constructed a small graph ({{purGraph}} in the code) with only 3 vertices. The edges of the graph are: 2->1, 3->1, 2->3.
2. Conduct the collect neighbors operation to get the {{inNeighborGraph}} of the  {{purGraph}}.
3. Outer join the  {{inNeighborGraph}} vertices on {{purGraph}} to get the {{dataGraph}}. In {{dataGraph}}, each vertex will store an ArrayBuffer of its in neighbors' vertex id list.
4. Now we can examine the {{inNeighbor}} attribute in {{dataGraph.vertices}} view and {{dataGraph.triplets}} view. We can see from the output that the two views are inconsistent on vertex 3's {{inNeighbor}} property:

{quote}
----> dataGraph.vertices
vid: 1, inNeighbor:2,3
vid: 3, inNeighbor:2
vid: 2, inNeighbor:
----> dataGraph.triplets.srcAttr
vid: 2, inNeighbor:
vid: 2, inNeighbor:
vid: 3, inNeighbor:
{quote}

5. If we comment the {{purGraph.triplets.count()}} statement in the code, the bug will disappear:
{code}
    val purGraph = Graph(dataVertex, dataEdge).persist()
  // purGraph.triplets.count() // !!!comment this
    val inNeighborGraph = purGraph.collectNeighbors(EdgeDirection.In)
    // Now join the in neighbor vertex id list to every vertex's property
    val dataGraph = purGraph.outerJoinVertices(inNeighborGraph)((vid, property, inNeighborList) => {
      val inNeighborVertexIds = inNeighborList.getOrElse(Array[(VertexId, VertexProperty)]()).map(t => t._1)
      property.inNeighbor ++= inNeighborVertexIds.toBuffer
      property
    })
{code}

It seems that the triplets view and the vertex view of the same graph may be inconsistent in some situation.

> srcAttr in graph.triplets don't update when the size of graph is huge
> ---------------------------------------------------------------------
>
>                 Key: SPARK-6378
>                 URL: https://issues.apache.org/jira/browse/SPARK-6378
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.2.1
>            Reporter: zhangzhenyue
>         Attachments: TripletsViewDonotUpdate.scala
>
>
> when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the srcAttr and dstAttr in graph.triplets don't update when using the Graph.outerJoinVertices(when the data in vertex is changed).
> the code and the log is as follows:
> {quote}
> g = graph.outerJoinVertices()...
> g,vertices,count()
> g.edges.count()
> println("example edge " + g.triplets.filter(e => e.srcId == 5000000001L).collect()
>       .map(e =>(e.srcId + ":" + e.srcAttr + ", " + e.dstId + ":" + e.dstAttr)).mkString("\n"))
>     println("example vertex " + g.vertices.filter(e => e._1 == 5000000001L).collect()
>       .map(e => (e._1 + "," + e._2)).mkString("\n"))
> {quote}
> the result:
> {quote}
> example edge 5000000001:0, 2467451620:61
> 5000000001:0, 1962741310:83 // attr of vertex 5000000001 is 0 in Graph.triplets
> example vertex 5000000001,2 // attr of vertex 5000000001 is 2 in Graph.vertices
> {quote}
> when the graph is smaller(10 million vertex), the code is OK, the triplets will update when the vertex is changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org