You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "kaywei (JIRA)" <ji...@apache.org> on 2015/08/04 07:43:04 UTC

[jira] [Commented] (SPARK-8497) Graph Clique(Complete Connected Sub-graph) Discovery Algorithm

    [ https://issues.apache.org/jira/browse/SPARK-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653101#comment-14653101 ] 

kaywei commented on SPARK-8497:
-------------------------------

In general, it follows algorithm from "https://en.wikipedia.org/wiki/Bron%E2%80%93Kerbosch_algorithm" with pivot, however, we made two major enhancements, one is to reorganize the data to meet the parallel computing requirement. The other one is to solve the 40 choose 20 problem, the algorithm will go deep along the most possible path, then filters all other redundant pathes by leveraging the information from the combination of "seed set + neighbor set" to prune all the other pathes at the earliest possible time. we had tested it with clique size of 200, it works fine.

> Graph Clique(Complete Connected Sub-graph) Discovery Algorithm
> --------------------------------------------------------------
>
>                 Key: SPARK-8497
>                 URL: https://issues.apache.org/jira/browse/SPARK-8497
>             Project: Spark
>          Issue Type: New Feature
>          Components: GraphX, ML, MLlib, Spark Core
>            Reporter: Fan Jiang
>            Assignee: Fan Jiang
>              Labels: features
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In recent years, social network industry has high demand on Complete Connected Sub-Graph Discoveries, so does Telecom. Similar as the graph connection from Twitter, the calls and other activities from telecoms world form a huge social graph, and due to the nature of communication method, it shows the strongest inter-person relationship, the graph based analysis will reveal tremendous value from telecoms connections. 
> We need an algorithm in Spark to figure out ALL the strongest completely connected sub-graph (so called Clique here) for EVERY person in the network which will be one of the start point for understanding user's social behaviour. 
> In Huawei, we have many real-world use cases that invovle telecom social graph of tens billion edges and hundreds million vertices, and the cliques will be also in tens million level. The graph will be a fast changing one which means we need to analyse the graph pattern very often (one result per day/week for moving time window which spans multiple months). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org