You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Jakob Homan (Created) (JIRA)" <ji...@apache.org> on 2011/11/15 01:04:51 UTC

[jira] [Created] (GIRAPH-83) Is Vertex correct yet?

Is Vertex correct yet?
----------------------

                 Key: GIRAPH-83
                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
             Project: Giraph
          Issue Type: Improvement
            Reporter: Jakob Homan


I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Joseph Adler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423938#comment-13423938 ] 

Joseph Adler commented on GIRAPH-83:
------------------------------------

I think Vertex is still very broken. You can't create a new class that extends Vertex outside of the org.apache.giraph package. I'm about to file a jira on that.
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152373#comment-13152373 ] 

Jake Mannix commented on GIRAPH-83:
-----------------------------------

bq. In that Vertex is responsible for maintaining the destEdgeMap for an implementation of Vertex, rather than implementers having to do this themselves. For each compute invocation, the vertex shouldn't assume anything about its outgoing edges, as they may have been mutated since the last call.

You mean that in the current Vertex class, we have the map of edges right there?  It's not really in the framework, it's in the superclass, but ok, you're saying we *shouldn't* take care of the bookkeeping about this, and leave it always up to the implementations (like the way that LongDoubleFloatDoubleVertex does it with primitives)?  Or that there should be some other structure which handles them?
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152897#comment-13152897 ] 

Jake Mannix commented on GIRAPH-83:
-----------------------------------

bq. How many different memory efficient implementations of Vertex can we expect to have?

I'm getting deja vu from the early days in Mahout, now.  "How many specialized forms of Vector would we possibly need?  I mean, Dense and Sparse, right?"  And then the discussion continues along the lines of "well there's vectors which look like maps (have efficient/fast random access), and also other vectors which are more compact even, but don't allow easy random access, but have superfast iterators, then there's vectors which contain only a seed and some offsetting information which tell you how to generate randomized sparse entries on the fly algorithmically, ..."

Avoid premature optimization, they say, but never imagine that you've discovered all of the kinds of crazy optimizations people will come up with for their particular graph algorithms (for instance, neural nets could want a Dense vertex, which has connections to *every* vertex of the 'next layer', and so doesn't even need to keep handles to the target vertex ids, just a big dense array of edge values, and a target layer identifier).
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Owen O'Malley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150666#comment-13150666 ] 

Owen O'Malley commented on GIRAPH-83:
-------------------------------------

I think that simplifying the interface is a great goal and it sounds like you're moving in the right direction. 

It seems a bit strange to be defining job wide properties using Vertex: use/registerAggregator. I'd think that an object that defines the job would be more appropriate.

+1 to moving the implementation details out of Vertex.
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422720#comment-13422720 ] 

Jakob Homan commented on GIRAPH-83:
-----------------------------------

I'd rather not.  Even with the new changes, I'm not quite convinced yet.  This should be an ongoing discussion.
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150680#comment-13150680 ] 

Jake Mannix commented on GIRAPH-83:
-----------------------------------

I agree that simplifying this class is a good idea.  Let's look at the specifics you're talking about:

Basic/Mutable/<nothing>-Vertex.  BasicVertex is essentially an interface, the definition of what it means to be a Vertex.  I think it should probably actually be *called* Vertex, because everything "is a" BasicVertex currently, so it makes sense instead to say everything "is a" Vertex.  At the other end, the class we currently call Vertex is just the generic implementation of everything you can do assuming you represent the edges and messages as a big Map<I, E> and List<M>.  If you want a special representation (ie using primitives, or compressed data structures, or even dynamically/algorithmically defined edge-sets where nothing lives in memory), you don't want to use this, and derive directly from BasicVertex.  

The only one we could reasonably do away with is MutableVertex, and declare that all Vertex impls are mutable.  But I'm wary of doing such a thing, as the idea of having a sub-implementation which simply *does not allow* mutation is a very common use case, and possibly knowing that you have an immutable graph structure can allow some implementations to do fun and speedy multithreaded computation (think: some people will run their Giraph jobs with heap sizes large enough per mapper to take over the whole box they're on, in which case they have lots of CPU to spare, and probably don't need it *all* for RPC).

Second: What belongs in Vertex itself?  We already moved some global stuff out of Vertex into GraphState (which should maybe be called "GlobalGraphState"), and we have the WorkerContext as a separate class, and the Aggregators separate.  

We can factor off lots of stuff into other classes, but the question comes down to how does the user writing their algorithm get access to them?  How is it all wired together?  You want compute() to get passed some state that you have right when you need it, instead of either going with inheritance *or* composition?  That could be nice, I think, as long as we package it all up into a minimal set of *Context-like objects to carry around.

In what way are the out edges of a vertex "managed by the framework" currently?
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150867#comment-13150867 ] 

Avery Ching commented on GIRAPH-83:
-----------------------------------

While you can decouple everything from vertex, I think it's pretty nice to keep the messsages and edges associated with it for things like checkpointing.
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152356#comment-13152356 ] 

Jakob Homan commented on GIRAPH-83:
-----------------------------------

bq. I think it should probably actually be called Vertex, because everything "is a" BasicVertex currently, so it makes sense instead to say everything "is a" Vertex.
Absolutely agreed.

bq. We can factor off lots of stuff into other classes, but the question comes down to how does the user writing their algorithm get access to them? How is it all wired together? You want compute() to get passed some state that you have right when you need it, instead of either going with inheritance or composition? That could be nice, I think, as long as we package it all up into a minimal set of *Context-like objects to carry around.
Correct, this is what I'm getting at.

bq. In what way are the out edges of a vertex "managed by the framework" currently?
In that Vertex is responsible for maintaining the destEdgeMap for an implementation of Vertex, rather than implementers having to do this themselves.  For each compute invocation, the vertex shouldn't assume anything about its outgoing edges, as they may have been mutated since the last call.
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152383#comment-13152383 ] 

Jakob Homan commented on GIRAPH-83:
-----------------------------------

I'm saying we should be responsible for maintaining it (since we have to mutate it), but that _maybe_ it shouldn't be in Vertex itself, just to have a cleaner delineation. But Avery makes a good point and I'm not completely sold on this aspect myself.   How many different memory efficient implementations of Vertex can we expect to have?
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150094#comment-13150094 ] 

Jakob Homan commented on GIRAPH-83:
-----------------------------------

Looking at the original Pregel paper, the Vertex instance has eight methods (compute, vertex_id, superstep, GetValue, MutableValue, GetOutEdgeIterator, SendMessageTo and VoteToHalt). Currently, BasicVertex has 24.  There are also three different types of Vertices (Vertex, MutableVertex and BasicVertex) linked via inheritance and exposed to the users.  I'm wondering if this interface is quite right yet.

There are two main concerns: one, this is the contract users are starting to write applications against and which we'll need to support for a long time, with as few tweaks as possible.  It'd be good to be relatively sure of its limits before we make an initial release.  Second, the use of inheritance to join the user's implementation with the computation's state makes it difficult to test.  How does one mock out the state that's fed into compute and verify compute's result without starting up a cluster (either real or local; see GIRAPH-51).

Would it be reasonable to strip out as many methods as possible from Vertex, particularly those dealing with state external to the Vertex itself: 
* getSuperStep
* getNumVertices
* getNumEdges
* getMsgList/iterator
* getEdgeValue
* hasEdge
* sendMsg
* sendMsgToAllEdges
* (g|s)etGraphState
* getContext
* getWorkerContext
* registerAggregator
* useAggregator

The outEdges data structures are a bit odd in that they are intrinsic to the vertex itself (in the mathematical sense), but are managed by the framework.  It might be a bit clunky, but structurally more correct to separate these out as well.
  
These methods and the state they manipulate could then be passed in as a Context (a new type of Context, not one of the two others we have running around!) to the compute method.  This moves compute() closer to a functional, testing model of computing across its input state (which can be mocked out for testing and mangled as we evolve its innards).  The Vertex itself could still of course maintain any state it would need, but like a Mapper, shouldn't need much and would be discouraged from holding onto larges amounts of data between computations.

Thoughts?
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

Posted by "Alessandro Presta (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422235#comment-13422235 ] 

Alessandro Presta commented on GIRAPH-83:
-----------------------------------------

Should we mark this resolved, given most (all?) of these issues are outdated or have been addressed by the latest API redesign?
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira