You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ohad Raviv (JIRA)" <ji...@apache.org> on 2016/03/17 17:17:33 UTC

[jira] [Commented] (SPARK-13313) Strongly connected components doesn't find all strongly connected components

    [ https://issues.apache.org/jira/browse/SPARK-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199811#comment-15199811 ] 

Ohad Raviv commented on SPARK-13313:
------------------------------------

Hi,
I am trying to use graphx's SCC and was very concerned with this issue, so I have taken this dataset and ran it with python's networkx strongly_connected_components function and got exactly the same results of 519 SCCs with maximal size = 4051.
So although I don't know what is the real result, the fact that both algorithms agree make me believe that they are correct.
I have also looked at the code and it looks fine to me, I don't agree that you should change the edge direction on line 89.

> Strongly connected components doesn't find all strongly connected components
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-13313
>                 URL: https://issues.apache.org/jira/browse/SPARK-13313
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.6.0
>            Reporter: Petar Zecevic
>
> Strongly connected components algorithm doesn't find all strongly connected components. I was using Wikispeedia dataset (http://snap.stanford.edu/data/wikispeedia.html) and the algorithm found 519 SCCs and one of them had 4051 vertices, which in reality don't have any edges between them. 
> I think the problem could be on line 89 of StronglyConnectedComponents.scala file where EdgeDirection.In should be changed to EdgeDirection.Out. I believe the second Pregel call should use Out edge direction, the same as the first call because the direction is reversed in the provided sendMsg function (message is sent to source vertex and not destination vertex).
> If that is changed (line 89), the algorithm starts finding much more SCCs, but eventually stack overflow exception occurs. I believe graph objects that are changed through iterations should not be cached, but checkpointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org