You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Nitay Joffe (JIRA)" <ji...@apache.org> on 2012/11/06 02:04:12 UTC

[jira] [Created] (GIRAPH-409) Refactor / cleanups

Nitay Joffe created GIRAPH-409:
----------------------------------

             Summary: Refactor / cleanups
                 Key: GIRAPH-409
                 URL: https://issues.apache.org/jira/browse/GIRAPH-409
             Project: Giraph
          Issue Type: Improvement
            Reporter: Nitay Joffe
            Assignee: Nitay Joffe
            Priority: Minor


Some general thoughts I've jotted down while going through the code. Writing them here to start tracking progress for them.

1. Refactor giraph.graph to giraph.master, giraph.worker. The whole giraph.graph package name is bad in general I think.
2. Cleanup giraph.utils. For example move timers stuff to giraph.time.
3. Change module names to be more maven-esque, that is something like giraph-root, giraph-core, giraph-formats.
4. Remove WorkerClientServer. Is this needed anymore?
5. Cleanup MasterThread#run: long convoluted method.
6. Cleanup BspService#process: lots of duplication. Use a vector of events or something.
7. Cleanup Vertex class: seems to me it has too many methods and should be a simpler interface (maybe even eventually an actual interface! not an abstract class). Add something like a Vertexes/Vertices class with helper methods that use can use.
8. {Master,Worker}Observer. Discussed elsewhere already. ALlow users to easily plug in code at various points in the system. Essentially a cleaner implementation of e.g. WorkerContext
9. Cleanup GraphMapper. I don't see why we even call a map() method seeing as we are overriding run(). We are clearly not particularly "mapreduce-y" so we should make it our entry point more clear than a map(). Also I think we should have something like a WorkerThread similar to MasterThread and clean up all of this to just creare whichever threads the node is assigned roles of. 
10. Move examples and anything else not needed for a giraph library out into it's own package (like giraph-examples)?


If someone +1s the ideas I'll work up some patches. Feel free to add other cleanup things here as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-409) Refactor / cleanups

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491670#comment-13491670 ] 

Maja Kabiljo commented on GIRAPH-409:
-------------------------------------

Great ideas, Nitay! 

1,2 - Definitely, our packaging is very bad. Maybe graph could be split by functionality, i.e. vertices/aggregators/input etc.
4 - Not used, also we can remove MasterClientServer, and just use separate classes on master also.
5,6 - Agreed, it should be cleaned up.
7 - Looking forward to see what you have in mind for this.
10 - Moving test classes was mentioned some time ago (http://mail-archives.apache.org/mod_mbox/giraph-dev/201208.mbox/%3C5018EFD7.7060301@apache.org%3E), but I wasn't successful in doing it. Please go ahead!
                
> Refactor / cleanups
> -------------------
>
>                 Key: GIRAPH-409
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-409
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>            Priority: Minor
>
> Some general thoughts I've jotted down while going through the code. Writing them here to start tracking progress for them.
> 1. Refactor giraph.graph to giraph.master, giraph.worker. The whole giraph.graph package name is bad in general I think.
> 2. Cleanup giraph.utils. For example move timers stuff to giraph.time.
> 3. Change module names to be more maven-esque, that is something like giraph-root, giraph-core, giraph-formats.
> 4. Remove WorkerClientServer. Is this needed anymore?
> 5. Cleanup MasterThread#run: long convoluted method.
> 6. Cleanup BspService#process: lots of duplication. Use a vector of events or something.
> 7. Cleanup Vertex class: seems to me it has too many methods and should be a simpler interface (maybe even eventually an actual interface! not an abstract class). Add something like a Vertexes/Vertices class with helper methods that use can use.
> 8. {Master,Worker}Observer. Discussed elsewhere already. ALlow users to easily plug in code at various points in the system. Essentially a cleaner implementation of e.g. WorkerContext
> 9. Cleanup GraphMapper. I don't see why we even call a map() method seeing as we are overriding run(). We are clearly not particularly "mapreduce-y" so we should make it our entry point more clear than a map(). Also I think we should have something like a WorkerThread similar to MasterThread and clean up all of this to just creare whichever threads the node is assigned roles of. 
> 10. Move examples and anything else not needed for a giraph library out into it's own package (like giraph-examples)?
> If someone +1s the ideas I'll work up some patches. Feel free to add other cleanup things here as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-409) Refactor / cleanups

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493498#comment-13493498 ] 

Eli Reisman commented on GIRAPH-409:
------------------------------------

Nice, I like this stuff, great work! You know Jakob also mentioned a while back that his dream for the Vertex API is to have an interface eventually. As we move to being more flexible about what sort of input data we accept (such as the edge data), having the implementation possibilities for the data structures under the hood to be less vertex-centric than they look on the surface might be a real win.

Couldn't agree more about changing the directory structure for clarity and for Maven's benefit, as well as improving the website and wiki docs.

I am spread thin right now but looking forward to diving in on some of this stuff as windows of time open up! Thanks for opening this thread to discuss and refine these ideas going forward. It'll save time when we get down to implementing and reviewing the results!

OK, gotta go...Thanks again!
                
> Refactor / cleanups
> -------------------
>
>                 Key: GIRAPH-409
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-409
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>            Priority: Minor
>
> Some general thoughts I've jotted down while going through the code. Writing them here to start tracking progress for them.
> 1. Refactor giraph.graph to giraph.master, giraph.worker. The whole giraph.graph package name is bad in general I think.
> 2. Cleanup giraph.utils. For example move timers stuff to giraph.time.
> 3. Change module names to be more maven-esque, that is something like giraph-root, giraph-core, giraph-formats.
> 4. Remove WorkerClientServer. Is this needed anymore?
> 5. Cleanup MasterThread#run: long convoluted method.
> 6. Cleanup BspService#process: lots of duplication. Use a vector of events or something.
> 7. Cleanup Vertex class: seems to me it has too many methods and should be a simpler interface (maybe even eventually an actual interface! not an abstract class). Add something like a Vertexes/Vertices class with helper methods that use can use.
> 8. {Master,Worker}Observer. Discussed elsewhere already. ALlow users to easily plug in code at various points in the system. Essentially a cleaner implementation of e.g. WorkerContext
> 9. Cleanup GraphMapper. I don't see why we even call a map() method seeing as we are overriding run(). We are clearly not particularly "mapreduce-y" so we should make it our entry point more clear than a map(). Also I think we should have something like a WorkerThread similar to MasterThread and clean up all of this to just creare whichever threads the node is assigned roles of. 
> 10. Move examples and anything else not needed for a giraph library out into it's own package (like giraph-examples)?
> If someone +1s the ideas I'll work up some patches. Feel free to add other cleanup things here as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-409) Refactor / cleanups

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491101#comment-13491101 ] 

Nitay Joffe commented on GIRAPH-409:
------------------------------------

cc [~aching] [~initialcontext]
                
> Refactor / cleanups
> -------------------
>
>                 Key: GIRAPH-409
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-409
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>            Priority: Minor
>
> Some general thoughts I've jotted down while going through the code. Writing them here to start tracking progress for them.
> 1. Refactor giraph.graph to giraph.master, giraph.worker. The whole giraph.graph package name is bad in general I think.
> 2. Cleanup giraph.utils. For example move timers stuff to giraph.time.
> 3. Change module names to be more maven-esque, that is something like giraph-root, giraph-core, giraph-formats.
> 4. Remove WorkerClientServer. Is this needed anymore?
> 5. Cleanup MasterThread#run: long convoluted method.
> 6. Cleanup BspService#process: lots of duplication. Use a vector of events or something.
> 7. Cleanup Vertex class: seems to me it has too many methods and should be a simpler interface (maybe even eventually an actual interface! not an abstract class). Add something like a Vertexes/Vertices class with helper methods that use can use.
> 8. {Master,Worker}Observer. Discussed elsewhere already. ALlow users to easily plug in code at various points in the system. Essentially a cleaner implementation of e.g. WorkerContext
> 9. Cleanup GraphMapper. I don't see why we even call a map() method seeing as we are overriding run(). We are clearly not particularly "mapreduce-y" so we should make it our entry point more clear than a map(). Also I think we should have something like a WorkerThread similar to MasterThread and clean up all of this to just creare whichever threads the node is assigned roles of. 
> 10. Move examples and anything else not needed for a giraph library out into it's own package (like giraph-examples)?
> If someone +1s the ideas I'll work up some patches. Feel free to add other cleanup things here as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira