You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Sebastian Schelter (Created) (JIRA)" <ji...@apache.org> on 2011/12/24 10:22:30 UTC
[jira] [Created] (GIRAPH-115) Port of the HCC algorithm for
identifying all connected components of a graph
Port of the HCC algorithm for identifying all connected components of a graph
-----------------------------------------------------------------------------
Key: GIRAPH-115
URL: https://issues.apache.org/jira/browse/GIRAPH-115
Project: Giraph
Issue Type: New Feature
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex.
The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1
The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010
http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-115) Port of the HCC algorithm for
identifying all connected components of a graph
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175814#comment-13175814 ]
jiraposter@reviews.apache.org commented on GIRAPH-115:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/
-----------------------------------------------------------
(Updated 2011-12-25 09:36:39.177630)
Review request for giraph.
Changes
-------
Reworked code to reflect Avery's comments regarding the code conventions.
Summary
-------
Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component.
I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data.
I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests.
Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).
This addresses bug GIRAPH-115.
https://issues.apache.org/jira/browse/GIRAPH-115
Diffs (updated)
-----
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837
/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java 1222837
Diff: https://reviews.apache.org/r/3313/diff
Testing
-------
Thanks,
Sebastian
> Port of the HCC algorithm for identifying all connected components of a graph
> -----------------------------------------------------------------------------
>
> Key: GIRAPH-115
> URL: https://issues.apache.org/jira/browse/GIRAPH-115
> Project: Giraph
> Issue Type: New Feature
> Affects Versions: 0.70.0
> Reporter: Sebastian Schelter
>
> Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex.
> The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1
> The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010
> http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-115) Port of the HCC algorithm for
identifying all connected components of a graph
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175700#comment-13175700 ]
jiraposter@reviews.apache.org commented on GIRAPH-115:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/
-----------------------------------------------------------
Review request for giraph.
Summary
-------
Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component.
I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data.
I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests.
Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).
This addresses bug GIRAPH-115.
https://issues.apache.org/jira/browse/GIRAPH-115
Diffs
-----
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837
/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION
Diff: https://reviews.apache.org/r/3313/diff
Testing
-------
Thanks,
Sebastian
> Port of the HCC algorithm for identifying all connected components of a graph
> -----------------------------------------------------------------------------
>
> Key: GIRAPH-115
> URL: https://issues.apache.org/jira/browse/GIRAPH-115
> Project: Giraph
> Issue Type: New Feature
> Affects Versions: 0.70.0
> Reporter: Sebastian Schelter
>
> Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex.
> The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1
> The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010
> http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-115) Port of the HCC algorithm for
identifying all connected components of a graph
Posted by "Avery Ching (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Avery Ching resolved GIRAPH-115.
--------------------------------
Resolution: Fixed
I think work like this will really lower the bar for folks to try out Giraph. =)
> Port of the HCC algorithm for identifying all connected components of a graph
> -----------------------------------------------------------------------------
>
> Key: GIRAPH-115
> URL: https://issues.apache.org/jira/browse/GIRAPH-115
> Project: Giraph
> Issue Type: New Feature
> Affects Versions: 0.70.0
> Reporter: Sebastian Schelter
>
> Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex.
> The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1
> The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010
> http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-115) Port of the HCC algorithm for
identifying all connected components of a graph
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175888#comment-13175888 ]
jiraposter@reviews.apache.org commented on GIRAPH-115:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/#review4117
-----------------------------------------------------------
Ship it!
Looks great. I'll commit on your behalf.
/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java
<https://reviews.apache.org/r/3313/#comment9277>
I'll fix this one for you. =) Remove */.
- Avery
On 2011-12-25 09:36:39, Sebastian Schelter wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3313/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-12-25 09:36:39)
bq.
bq.
bq. Review request for giraph.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component.
bq.
bq. I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data.
bq.
bq. I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests.
bq.
bq. Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).
bq.
bq.
bq. This addresses bug GIRAPH-115.
bq. https://issues.apache.org/jira/browse/GIRAPH-115
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837
bq. /trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION
bq. /trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION
bq. /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION
bq. /trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java 1222837
bq.
bq. Diff: https://reviews.apache.org/r/3313/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Sebastian
bq.
bq.
> Port of the HCC algorithm for identifying all connected components of a graph
> -----------------------------------------------------------------------------
>
> Key: GIRAPH-115
> URL: https://issues.apache.org/jira/browse/GIRAPH-115
> Project: Giraph
> Issue Type: New Feature
> Affects Versions: 0.70.0
> Reporter: Sebastian Schelter
>
> Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex.
> The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1
> The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010
> http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-115) Port of the HCC algorithm for
identifying all connected components of a graph
Posted by "Avery Ching (Assigned) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Avery Ching reassigned GIRAPH-115:
----------------------------------
Assignee: Sebastian Schelter
> Port of the HCC algorithm for identifying all connected components of a graph
> -----------------------------------------------------------------------------
>
> Key: GIRAPH-115
> URL: https://issues.apache.org/jira/browse/GIRAPH-115
> Project: Giraph
> Issue Type: New Feature
> Affects Versions: 0.70.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
>
> Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex.
> The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1
> The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010
> http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-115) Port of the HCC algorithm for
identifying all connected components of a graph
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175889#comment-13175889 ]
Hudson commented on GIRAPH-115:
-------------------------------
Integrated in Giraph-trunk-Commit #59 (See [https://builds.apache.org/job/Giraph-trunk-Commit/59/])
GIRAPH-115: Port of the HCC algorithm for identifying all connected
components of a graph. (ssc via aching)
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1224678
Files :
* /incubator/giraph/trunk/CHANGELOG
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java
> Port of the HCC algorithm for identifying all connected components of a graph
> -----------------------------------------------------------------------------
>
> Key: GIRAPH-115
> URL: https://issues.apache.org/jira/browse/GIRAPH-115
> Project: Giraph
> Issue Type: New Feature
> Affects Versions: 0.70.0
> Reporter: Sebastian Schelter
>
> Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex.
> The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1
> The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010
> http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-115) Port of the HCC algorithm for
identifying all connected components of a graph
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175780#comment-13175780 ]
jiraposter@reviews.apache.org commented on GIRAPH-115:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/#review4114
-----------------------------------------------------------
Sebastian, this is really awesome work, thanks for sharing it! While I didn't read the paper, your code looks good and the compute() code is pretty straightforward. IntIntNullIntVertex.java is a good example of how to make a very compact vertex.
I only have a few minor formatting requests.
In the CODE_CONVENTIONS, comments should be:
- All classes, members, and member methods should have Javadoc in the following
style. C-style comments for javadoc and // comments for non-javadoc. Also,
the comment block should have a line break that separates the comment
section and the @ section.
While not in the CODE_CONVENTIONS, but should be, Giraph follows spaces in the <>, i.e. <I, V, E, M>. Can you please add spaces i.e. <IntWritable, IntWritable, NullWritable, IntWritable>?
I marked as many as I could find, please correct any others I have missed.
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9254>
CODE_CONVENTIONS comment suggestion.
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9253>
You could shorten this a tad with the foreach pattern instead.
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9255>
CODE_CONVENTIONS comment suggestion.
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9256>
Could be foreach (again).
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9257>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9258>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9259>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9260>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9261>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9262>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9263>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9264>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
<https://reviews.apache.org/r/3313/#comment9265>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java
<https://reviews.apache.org/r/3313/#comment9266>
Capital 'text'
/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java
<https://reviews.apache.org/r/3313/#comment9267>
NullWritable,IntWritable
should be
NullWritable, IntWritable
/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java
<https://reviews.apache.org/r/3313/#comment9268>
String,String -> String, String
- Avery
On 2011-12-24 09:32:15, Sebastian Schelter wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3313/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-12-24 09:32:15)
bq.
bq.
bq. Review request for giraph.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component.
bq.
bq. I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data.
bq.
bq. I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests.
bq.
bq. Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).
bq.
bq.
bq. This addresses bug GIRAPH-115.
bq. https://issues.apache.org/jira/browse/GIRAPH-115
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION
bq. /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837
bq. /trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION
bq. /trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION
bq. /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3313/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Sebastian
bq.
bq.
> Port of the HCC algorithm for identifying all connected components of a graph
> -----------------------------------------------------------------------------
>
> Key: GIRAPH-115
> URL: https://issues.apache.org/jira/browse/GIRAPH-115
> Project: Giraph
> Issue Type: New Feature
> Affects Versions: 0.70.0
> Reporter: Sebastian Schelter
>
> Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex.
> The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1
> The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010
> http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira