You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Jake Mannix (JIRA)" <ji...@apache.org> on 2011/09/17 01:01:09 UTC

[jira] [Created] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Ensure that subclassing BasicVertex is possible by user apps
------------------------------------------------------------

                 Key: GIRAPH-36
                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
             Project: Giraph
          Issue Type: Improvement
          Components: graph
    Affects Versions: 0.70.0
            Reporter: Jake Mannix
            Assignee: Jake Mannix
            Priority: Blocker
             Fix For: 0.70.0


Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.

Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by Claudio Martella <cl...@gmail.com>.
Hi Jake,

yep, you basically understand it correctly, it is indeed something
like a SSSP, but I'm starting from many vertices at the same time (so
i'd call it a grammar-based ASSP).

To make it more clear, the current code looks like:

    @Override
    public void compute(Iterator<Message> messages)
    throws IOException {

        if (getSuperstep() == 0 && isSource()) {
            processMessage(new Message(this.query, new ResultSet()));
        } else {
            while (messages.hasNext()) {
                processMessage(messages.next());
            }
        }

        voteToHalt();
    }

as you can see I have used a hack and saved the query as VertexValue
(so I at least saved parsing it multiple times from String).

With what I have in mind, I could remove the first if-condition completely.
Also, as not all the vertices are starting vertices, I could obtain
the complete list of starting vertices by running a filter based on
the first element of my stack/query. Suppose my query asks for all the
actors who have acted in a movie which took place in USA. (actor -
acted_in -> movie - takes_place -> USA), I could set as a starting
vertices only those vertices that have an outgoing edge with label
"acted_in". That would save quite a lot. As we didn't have an API to
do this yet, I still haven't figured out how to do this precisely, but
i guess this could be done in the VertexReader.

On Wed, Nov 2, 2011 at 4:51 PM, Jake Mannix <ja...@gmail.com> wrote:
> On Wed, Nov 2, 2011 at 5:07 AM, Claudio Martella <claudio.martella@gmail.com
>> wrote:
>>
>> Not each vertex is a starting vertex for the path traversals, so it
>> could be nice if at the first superstep the starting vertices could
>> already have the messages in their inbox. This way the message
>> processing (and path traversal processing with it) could be fully
>> transparent (vertices just have to iterate throw messages without
>> caring about parsing, start vertices etc etc).
>>
>> Currently, at first superstep, each vertex has to get che query from
>> the conf, parse it into a stack and discard it afterwards, as soon as
>> it discovers it is not a starting vertex for a path traversal.
>> Outside, I know who's a starting vertex and could just set the inbox
>> for those vertices accordingly.
>>
>
> Let me understand what the situation looks like, by basically repeating
> this back to you: in your case, the set of queries which are to be run
> over your graph need to be read/parsed at startup and "given" to the
> starting node who is going to be responsible for initiating their
> propagation
> across the graph, right?
>
> For one thing, storing this batch set of queries in the Configuration
> doesn't seem very scalable - having that live on-disk and be read
> by the VertexReader does seem to make sense, although I wonder how
> easy it would be to coerce your serialized on-disk format to be nice
> and re-use the unchanging graph for multiple queries?  Maybe a
> MultipleInputFormats kind of thing would be needed?
>
> So how is this different from SimpleShortestPathsVertex, in the way
> that one vertex is "special"?  The compute() method of that vertex
> uses the knowledge of whether it is the source or not to set its "distance"
> from the origin, and only when that distance is > the current min does
> it send messages out.
>
> I could see how algorithms like this could possibly be sped up if we
> didn't need to iterate over *all* of the graph on each superstep, as the
> first step only needs to send messages to the vertices directly
> connected to the origin, then only it's 2nd degree, etc.  If we were
> only to process the vertices which had messages to send, for example,
> and iterate over those in a sparse fashion, this algorithm could be
> sped up considerably.
>
> I guess you're saying that similarly, not only if the messages were
> directly associated with their starting nodes, but if during the superstep
> we could only iterate over the nodes with messages, we could be much
> faster.
>
> It seems there might be a bunch of algorithms which would work this
> way, and I wonder if it would be a good idea to modify the system to
> have GraphMapper check to see if the vertex class "isSparse()" in
> some sense, which would mean that in the superstep walk:
>
>            for (Map.Entry<I, VertexRange<I, V, E, M>> entry :
>                serviceWorker.getVertexRangeMap().entrySet()) {
> // ...
>                for (BasicVertex<I, V, E, M> vertex :
>                        entry.getValue().getVertexMap().values()) {
> // ...
>                    if (!vertex.isHalted()) {
>                        Iterator<M> vertexMsgIt =
>                            vertex.getMsgList().iterator();
>                         context.progress();
>                        vertex.compute(vertexMsgIt);
>                    }
>
> we instead would iterate over a sparse data structure which
> only contains the vertices to be processed (this data structure
> would of course change from superstep to superstep as the
> messages propagate across the graph).  This could be a pretty
> significant speedup a lot of the time in practice, as it would
> mean that iterations per superstep would scale in complexity
> as O( |currently visible part of the graph| ) as opposed to
> O( | entire graph| ), even if most of the iteration is just a bunch
> of in-memory boolean and size checks.
>
> Not sure if this is exactly what you're talking about, however.
>
>  -jake
>



-- 
    Claudio Martella
    claudio.martella@gmail.com

Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by Jake Mannix <ja...@gmail.com>.
On Wed, Nov 2, 2011 at 5:07 AM, Claudio Martella <claudio.martella@gmail.com
> wrote:
>
> Not each vertex is a starting vertex for the path traversals, so it
> could be nice if at the first superstep the starting vertices could
> already have the messages in their inbox. This way the message
> processing (and path traversal processing with it) could be fully
> transparent (vertices just have to iterate throw messages without
> caring about parsing, start vertices etc etc).
>
> Currently, at first superstep, each vertex has to get che query from
> the conf, parse it into a stack and discard it afterwards, as soon as
> it discovers it is not a starting vertex for a path traversal.
> Outside, I know who's a starting vertex and could just set the inbox
> for those vertices accordingly.
>

Let me understand what the situation looks like, by basically repeating
this back to you: in your case, the set of queries which are to be run
over your graph need to be read/parsed at startup and "given" to the
starting node who is going to be responsible for initiating their
propagation
across the graph, right?

For one thing, storing this batch set of queries in the Configuration
doesn't seem very scalable - having that live on-disk and be read
by the VertexReader does seem to make sense, although I wonder how
easy it would be to coerce your serialized on-disk format to be nice
and re-use the unchanging graph for multiple queries?  Maybe a
MultipleInputFormats kind of thing would be needed?

So how is this different from SimpleShortestPathsVertex, in the way
that one vertex is "special"?  The compute() method of that vertex
uses the knowledge of whether it is the source or not to set its "distance"
from the origin, and only when that distance is > the current min does
it send messages out.

I could see how algorithms like this could possibly be sped up if we
didn't need to iterate over *all* of the graph on each superstep, as the
first step only needs to send messages to the vertices directly
connected to the origin, then only it's 2nd degree, etc.  If we were
only to process the vertices which had messages to send, for example,
and iterate over those in a sparse fashion, this algorithm could be
sped up considerably.

I guess you're saying that similarly, not only if the messages were
directly associated with their starting nodes, but if during the superstep
we could only iterate over the nodes with messages, we could be much
faster.

It seems there might be a bunch of algorithms which would work this
way, and I wonder if it would be a good idea to modify the system to
have GraphMapper check to see if the vertex class "isSparse()" in
some sense, which would mean that in the superstep walk:

            for (Map.Entry<I, VertexRange<I, V, E, M>> entry :
                serviceWorker.getVertexRangeMap().entrySet()) {
// ...
                for (BasicVertex<I, V, E, M> vertex :
                        entry.getValue().getVertexMap().values()) {
// ...
                    if (!vertex.isHalted()) {
                        Iterator<M> vertexMsgIt =
                            vertex.getMsgList().iterator();
                         context.progress();
                        vertex.compute(vertexMsgIt);
                    }

we instead would iterate over a sparse data structure which
only contains the vertices to be processed (this data structure
would of course change from superstep to superstep as the
messages propagate across the graph).  This could be a pretty
significant speedup a lot of the time in practice, as it would
mean that iterations per superstep would scale in complexity
as O( |currently visible part of the graph| ) as opposed to
O( | entire graph| ), even if most of the iteration is just a bunch
of in-memory boolean and size checks.

Not sure if this is exactly what you're talking about, however.

  -jake

Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by Claudio Martella <cl...@gmail.com>.
Great, so I can I base GIRAPH-10 and GIRAPH-47 after this one!

About my use case:

I'm implementing a batch path query engine where the queries are
XPath-like queries. Each query is a stack of conditions, each element
assigned to a depth of the path traversal. A message is a stack.

Not each vertex is a starting vertex for the path traversals, so it
could be nice if at the first superstep the starting vertices could
already have the messages in their inbox. This way the message
processing (and path traversal processing with it) could be fully
transparent (vertices just have to iterate throw messages without
caring about parsing, start vertices etc etc).

Currently, at first superstep, each vertex has to get che query from
the conf, parse it into a stack and discard it afterwards, as soon as
it discovers it is not a starting vertex for a path traversal.
Outside, I know who's a starting vertex and could just set the inbox
for those vertices accordingly.

Hope this helps,
Claudio

On Wed, Nov 2, 2011 at 9:14 AM, Jake Mannix (Commented) (JIRA)
<ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141994#comment-13141994 ]
>
> Jake Mannix commented on GIRAPH-36:
> -----------------------------------
>
> Excellent, looks good to me.
> +1 for committing we can iterate from here is the main point.  The sooner this gets in, the fewer other patches out there get broken. ;)
>
>> Ensure that subclassing BasicVertex is possible by user apps
>> ------------------------------------------------------------
>>
>>                 Key: GIRAPH-36
>>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>>             Project: Giraph
>>          Issue Type: Improvement
>>          Components: graph
>>    Affects Versions: 0.70.0
>>            Reporter: Jake Mannix
>>            Assignee: Jake Mannix
>>            Priority: Blocker
>>             Fix For: 0.70.0
>>
>>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff.warnings
>>
>>
>> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
>> Let's make sure the internal APIs allow for BasicVertex to be the base class.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>



-- 
    Claudio Martella
    claudio.martella@gmail.com

Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by Avery Ching <ac...@apache.org>.
I'd also like to hear the use case.  Currently we don't dump messages in 
the vertex output format, but maybe there is a similar case to do so?

Avery

On 10/31/11 11:46 AM, Jake Mannix wrote:
> Well I guess that gives us one reason to keep it in the API.  What's the
> reasoning?  Are there
> static data sets which make the most sense to have "initial messages"
> serialized with the
> graph, instead of generating them at start?
>
> I guess if what you're modeling is in some sense a "2nd order"
> difference/differential
> equation, then knowing the state of the graph is not enough information to
> uniquely
> describe the evolution, you also need the "first derivative" of it's state
> (ie the set of
> messages it has at any given time).
>
>    -jake
>
> On Mon, Oct 31, 2011 at 11:38 AM, Claudio Martella<
> claudio.martella@gmail.com>  wrote:
>
>> I actually like the idea of having the messages being inserted at
>> vertex load. Currently I'm actually fighting with this functionality
>> missing and was going to open and issue sooner or later.
>>
>>
>> On Mon, Oct 31, 2011 at 6:19 PM, Jake Mannix (Commented) (JIRA)
>> <ji...@apache.org>  wrote:
>>>     [
>> https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140330#comment-13140330]
>>> Jake Mannix commented on GIRAPH-36:
>>> -----------------------------------
>>>
>>> 1) BspUtils.createVertex(Configuration conf, GraphState<I,V,E,M>
>> graphState) requires access to the GraphState for instantiation, currently.
>>   We could avoid it by taking that setGraphState() away from that method and
>> leaving it in wherever it gets first used (GraphMapper?), but why not be
>> safe, and always set it right after instantiation, so you know that there's
>> no other place where someone decides to do BspUtils.createVertex(), but
>> forgets to then setGraphState() on it.
>>> 2) I really don't know whether it makes sense to be able to instantiate
>> "in-flight" messages with vertices.  I just wanted to future-proof the API
>> a little bit by allowing for the possibility.  I'm fine either way.
>>>> Ensure that subclassing BasicVertex is possible by user apps
>>>> ------------------------------------------------------------
>>>>
>>>>                  Key: GIRAPH-36
>>>>                  URL: https://issues.apache.org/jira/browse/GIRAPH-36
>>>>              Project: Giraph
>>>>           Issue Type: Improvement
>>>>           Components: graph
>>>>     Affects Versions: 0.70.0
>>>>             Reporter: Jake Mannix
>>>>             Assignee: Jake Mannix
>>>>             Priority: Blocker
>>>>              Fix For: 0.70.0
>>>>
>>>>          Attachments: GIRAPH-36.diff
>>>>
>>>>
>>>> Original assumptions in Giraph were that all users would subclass
>> Vertex (which extended MutableVertex extended BasicVertex).  Classes which
>> wish to have application specific data structures (ie. not a TreeMap<I,
>> Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.
>>   Unfortunately VertexRange extends ArrayList<Vertex>, and there are other
>> places where the assumption is that vertex classes are either Vertex, or at
>> least MutableVertex.
>>>> Let's make sure the internal APIs allow for BasicVertex to be the base
>> class.
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>> For more information on JIRA, see:
>> http://www.atlassian.com/software/jira
>>>
>>>
>>
>>
>> --
>>      Claudio Martella
>>      claudio.martella@gmail.com
>>


Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by Jake Mannix <ja...@gmail.com>.
Well I guess that gives us one reason to keep it in the API.  What's the
reasoning?  Are there
static data sets which make the most sense to have "initial messages"
serialized with the
graph, instead of generating them at start?

I guess if what you're modeling is in some sense a "2nd order"
difference/differential
equation, then knowing the state of the graph is not enough information to
uniquely
describe the evolution, you also need the "first derivative" of it's state
(ie the set of
messages it has at any given time).

  -jake

On Mon, Oct 31, 2011 at 11:38 AM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> I actually like the idea of having the messages being inserted at
> vertex load. Currently I'm actually fighting with this functionality
> missing and was going to open and issue sooner or later.
>
>
> On Mon, Oct 31, 2011 at 6:19 PM, Jake Mannix (Commented) (JIRA)
> <ji...@apache.org> wrote:
> >
> >    [
> https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140330#comment-13140330]
> >
> > Jake Mannix commented on GIRAPH-36:
> > -----------------------------------
> >
> > 1) BspUtils.createVertex(Configuration conf, GraphState<I,V,E,M>
> graphState) requires access to the GraphState for instantiation, currently.
>  We could avoid it by taking that setGraphState() away from that method and
> leaving it in wherever it gets first used (GraphMapper?), but why not be
> safe, and always set it right after instantiation, so you know that there's
> no other place where someone decides to do BspUtils.createVertex(), but
> forgets to then setGraphState() on it.
> >
> > 2) I really don't know whether it makes sense to be able to instantiate
> "in-flight" messages with vertices.  I just wanted to future-proof the API
> a little bit by allowing for the possibility.  I'm fine either way.
> >
> >> Ensure that subclassing BasicVertex is possible by user apps
> >> ------------------------------------------------------------
> >>
> >>                 Key: GIRAPH-36
> >>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
> >>             Project: Giraph
> >>          Issue Type: Improvement
> >>          Components: graph
> >>    Affects Versions: 0.70.0
> >>            Reporter: Jake Mannix
> >>            Assignee: Jake Mannix
> >>            Priority: Blocker
> >>             Fix For: 0.70.0
> >>
> >>         Attachments: GIRAPH-36.diff
> >>
> >>
> >> Original assumptions in Giraph were that all users would subclass
> Vertex (which extended MutableVertex extended BasicVertex).  Classes which
> wish to have application specific data structures (ie. not a TreeMap<I,
> Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.
>  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other
> places where the assumption is that vertex classes are either Vertex, or at
> least MutableVertex.
> >> Let's make sure the internal APIs allow for BasicVertex to be the base
> class.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
> >
>
>
>
> --
>     Claudio Martella
>     claudio.martella@gmail.com
>

Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by Claudio Martella <cl...@gmail.com>.
I actually like the idea of having the messages being inserted at
vertex load. Currently I'm actually fighting with this functionality
missing and was going to open and issue sooner or later.


On Mon, Oct 31, 2011 at 6:19 PM, Jake Mannix (Commented) (JIRA)
<ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140330#comment-13140330 ]
>
> Jake Mannix commented on GIRAPH-36:
> -----------------------------------
>
> 1) BspUtils.createVertex(Configuration conf, GraphState<I,V,E,M> graphState) requires access to the GraphState for instantiation, currently.  We could avoid it by taking that setGraphState() away from that method and leaving it in wherever it gets first used (GraphMapper?), but why not be safe, and always set it right after instantiation, so you know that there's no other place where someone decides to do BspUtils.createVertex(), but forgets to then setGraphState() on it.
>
> 2) I really don't know whether it makes sense to be able to instantiate "in-flight" messages with vertices.  I just wanted to future-proof the API a little bit by allowing for the possibility.  I'm fine either way.
>
>> Ensure that subclassing BasicVertex is possible by user apps
>> ------------------------------------------------------------
>>
>>                 Key: GIRAPH-36
>>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>>             Project: Giraph
>>          Issue Type: Improvement
>>          Components: graph
>>    Affects Versions: 0.70.0
>>            Reporter: Jake Mannix
>>            Assignee: Jake Mannix
>>            Priority: Blocker
>>             Fix For: 0.70.0
>>
>>         Attachments: GIRAPH-36.diff
>>
>>
>> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
>> Let's make sure the internal APIs allow for BasicVertex to be the base class.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>



-- 
    Claudio Martella
    claudio.martella@gmail.com

Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by Claudio Martella <cl...@gmail.com>.
patch looks very nice from italy as well :) I'll test it this week on my setup.

On Mon, Oct 31, 2011 at 8:26 AM, Jake Mannix (Commented) (JIRA)
<ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139978#comment-13139978 ]
>
> Jake Mannix commented on GIRAPH-36:
> -----------------------------------
>
> Yeah, sorry it's so bit, Hyunsik - GIRAPH-28 kinda turned into a bit of a yak-shaving affair, I know!
>
>> Ensure that subclassing BasicVertex is possible by user apps
>> ------------------------------------------------------------
>>
>>                 Key: GIRAPH-36
>>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>>             Project: Giraph
>>          Issue Type: Improvement
>>          Components: graph
>>    Affects Versions: 0.70.0
>>            Reporter: Jake Mannix
>>            Assignee: Jake Mannix
>>            Priority: Blocker
>>             Fix For: 0.70.0
>>
>>         Attachments: GIRAPH-36.diff
>>
>>
>> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
>> Let's make sure the internal APIs allow for BasicVertex to be the base class.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>



-- 
    Claudio Martella
    claudio.martella@gmail.com

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141338#comment-13141338 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Hmmm... removing access to GraphState in userland code also makes it harder to unit test, as you have to do a bunch of setAccessible() calls, or convoluted mocking...  oh well.  
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139978#comment-13139978 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Yeah, sorry it's so bit, Hyunsik - GIRAPH-28 kinda turned into a bit of a yak-shaving affair, I know!
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Claudio Martella (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142581#comment-13142581 ] 

Claudio Martella commented on GIRAPH-36:
----------------------------------------

Hi Jakob,

as a matter of fact, I believe, part of your proposal is addressed by GIRAPH-47. 
Now that Jake has committed this work, I'll sync and complete that patch to the current trunk (I was out of office for conference the whole last week).

Can GIRAPH-47 be a start for what you have in mind?
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff.warnings
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Hyunsik Choi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139976#comment-13139976 ] 

Hyunsik Choi commented on GIRAPH-36:
------------------------------------

Looks great!

Actually, I need more time to fully keep up with this patch.
First of all, I have executed unit tests on real hadoop cluster running on local host.
All tests are passed!
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139098#comment-13139098 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Hmm... on further thought: passing getRecordReader().getCurrentValue() requires the parsing happen in the Vertex implementation.  That work is the primary *job* of the VertexReader, so I think we're going to need something like:

{code}
class BasicVertex<I,V,E,M> {
  abstract void initialize(I id, V val, Map<I,E> edges, Iterable<M> message);
}
{code}

Of course, we could also just use *gasp* a constructor?


                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142573#comment-13142573 ] 

Jakob Homan commented on GIRAPH-36:
-----------------------------------

Great work, Jake. One thing I notice while reading through Vertex is muddling of the state the Vertex is responsible for on a per-application basis and the state Giraph manages for the vertex.  I think we may being ill-served by inheritance here and should instead rely on composition to hold this state.  I'm thinking that messages in/out, edge state and mutation, facilities for sending messages, current superstep, etc. Would it be better in the long term to move these out to a context object (ala MR)?  This would simplify Vertex significantly, make it much easier to test (by mocking out the dependency) and future proof the evolution of Vertex as there will be fewer moving parts to keep track of.

Does changing compute and {pre|post}{Superstep|Application} took their external state as parameters with sound like a good approach to explore?
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff.warnings
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140330#comment-13140330 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

1) BspUtils.createVertex(Configuration conf, GraphState<I,V,E,M> graphState) requires access to the GraphState for instantiation, currently.  We could avoid it by taking that setGraphState() away from that method and leaving it in wherever it gets first used (GraphMapper?), but why not be safe, and always set it right after instantiation, so you know that there's no other place where someone decides to do BspUtils.createVertex(), but forgets to then setGraphState() on it.  

2) I really don't know whether it makes sense to be able to instantiate "in-flight" messages with vertices.  I just wanted to future-proof the API a little bit by allowing for the possibility.  I'm fine either way.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107350#comment-13107350 ] 

Avery Ching commented on GIRAPH-36:
-----------------------------------

There is an out-of-core part as well though (checkpointing).  Forcing the types (I, V, E, M) to implement Writable seemed like a nice easy way for Hadoop users to jump into Giraph.  Also, it provides us a lot of reusable objects (IntWritable, FloatWritable, DoubleWritable, etc.).  I am open to other ideas of course.  Let me know your thoughts.

> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107339#comment-13107339 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Yeah, I having it return the current vertex sounds good, I guess.  There's still something nagging at me about the way Writables are being used: Giraph is *different* from Hadoop: there's a persistent, in-memory data structure being built here, where there *isn't* in Hadoop.  Regardless of how we read the data, or send the data over the wire, or write it to disk, we're also hanging onto it.  I wonder if we need to make the abstraction around that more clear?

Maybe simply solving the title of this JIRA ticket would do the trick, which would at a minimum require that BasicVertex implement Writable, and other than that, it could work with VertexReader API's of either flavor.

I think I can try working on this ticket without monkeying with the VertexReader API, but I won't know until I start unravelling this ball of string a bit.

> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107460#comment-13107460 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Oh, I'm not saying we shouldn't have I,V,E,M all implement Writable - we should, and that makes it really easy to implement Writable in BasicVertex.

> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107286#comment-13107286 ] 

Avery Ching commented on GIRAPH-36:
-----------------------------------

The reason for the current VertexReader API was to match the old Hadoop RecordReader API and make it natural for folks to move to vertices instead of keys and values.  The old Hadoop RecordReader API 

org.apache.hadoop.mapred.RecordReader

boolean next(K key, V value) throws IOException;

and the current VertexReader API is 

boolean next(MutableVertex<I, V, E, ?> vertex)
    throws IOException, InterruptedException;

That being said, the new Hadoop RecordReader API is different:

org.apache.hadoop.mapreduce.RecordReader

boolean nextKeyValue() throws IOException, InterruptedException;
KEYIN getCurrentKey() throws IOException, InterruptedException;
VALUEIN getCurrentValue() throws IOException, InterruptedException;

It's probably easier to follow that (especially regarding your points).  Given it's a user facing API we should get a few more opinions on it though.  I imagine the change would be something closer to:

boolean nextVertex() throws IOException, InterruptedException;
BasicVertex<I, V, E, M> getCurrentVertex() throws IOException, InterruptedException;

As far as the questions about BasicVertex and MutableVertex, the general idea would be that BasicVertex would be a safer interface to use whenever possible.  However, the Vertex class hierarchy has evolved and I wouldn't mind changing it since it's not really as useful as it should be.  In general, we should only provide the interfaces necessary for each method to ensure we (or the users) can't do something stupid.  So probably a(n) (nearly) immutable interface for storage, one for the user to access their methods, etc...



> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Avery Ching (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-36:
------------------------------

    Attachment: GIRAPH-36.diff.warnings

Revised version of Jake's patch, addressing the warnings in src/main.  Some minor cleanup.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff.warnings
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140974#comment-13140974 ] 

Avery Ching commented on GIRAPH-36:
-----------------------------------

Regarding 1) I think it would be safer for use to take away setGraphState() from createVertex to avoid anyone mucking with the GraphState in the VertexInputFormat.  It's kind of a nasty thing and would be preferable for users to not know about it.  Actually, in our current code, we call setGraphState prior to processing each vertex anyway (see GraphMapper.java), so the graph state not being set shouldn't be an issue.

As far as 2), I really am curious to know what the use case for it is.  I personally think it's a little weird.  Currently, we don't do any processing on messages for superstep -1 (input superstep).  So I guess the messages would be processed in superstep 0?  

Anyway, once we figure out 1), please work on your formatting (i.e. a couple of if(cond) instead of if (cond) and at least keep spaces consistent per file.  Then we can commit it.  Oh, and if there are any warnings, please take care of them.  I picked up a lot of warnings from a recent Giraph commit...


                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139871#comment-13139871 ] 

Avery Ching commented on GIRAPH-36:
-----------------------------------

Tried out your stuff Jake.  Unittests passed for me.  I took a quick look at it and was happy to see the new input format style interfaces (nextVertex() and getCurrentVertex()).  Also very cool to see LongDoubleFloatDoubleVertex in action as a proof-of-concept non-Vertex implementation.  Couple of questions/comments.

1)  I would personally prefer that GraphState not be exposed to developers/users from VertexReader.  We can always set the GraphState from the infrastructure after the vertices have been read.  

2)  Does VertexReader really need the message type M?  There aren't any messages at that point and complicates things a bit.

3)  I know it's not ready for primetime as you mentioned, but while we currently have two styles of formatting, hopefully we can at least keep files consistent until one of us fixes everything to the new convention.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107281#comment-13107281 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Initial thoughts:

  VertexReader defines a "next(MutableVertex vertex)" method, which does the sensible thing of filling in the vertex from the HDFS block, and because it takes a vertex object and messes with it, it's natural that the vertex be "required" to be a MutableVertex.

  But of course this implies that *everything* be a MutableVertex, because if you can't be read in by a VertexReader, where do you get instantiated at all?  If BasicVertex implements Writable, we could always readFields() data in, but not allow mutation, but this seems like it would interfere with the way VertexReader allows users to read straight from Text, etc.  This would allow VertexList to extend ArrayList<BasicVertex> instead of ArrayList<Vertex>, at the same time.

Anyone have any thoughts/ideas?  Are we wedded to making VertexReader implementations deal with MutableVertex, or can we swap them to handle Writable BasicVertex?

> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107625#comment-13107625 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Looking into this a bit further, I think I do like the idea of:

VertexReader:

boolean nextVertex() throws IOException, InterruptedException;
BasicVertex<I, V, E, M> getCurrentVertex() throws IOException, InterruptedException;

The trick is that now VertexReader is the one doing the instantiation of the BasicVertex instances, which means it will need a handle on the GraphState, but that's fine - the only place where the VertexInputFormat is currently instantiated is in BspServiceMaster and BspServiceWorker, where there is a handle on the GraphState anyways.

I'm just going to try and finish coding this up with the patch to GIRAPH-28 also applied, so I have a proven use-case already in hand to verify it makes sense / works on a cluster.

> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141994#comment-13141994 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Excellent, looks good to me.
+1 for committing we can iterate from here is the main point.  The sooner this gets in, the fewer other patches out there get broken. ;)
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff.warnings
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139962#comment-13139962 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Regarding 1) - I guess I can see how VertexReader could be an abstract base class which hangs onto the GraphState, and instantiates the BasicVertex instances for its subclasses, using the private GraphState reference, and passing it to an abstract initializeVertex(BasicVertex<I,V,E,M> v) method to be populated via the (newly added in this patch) BasicVertex#initialize(<lotso args>) method.

Thinking further on 2) - am I just not seeing how we could easily make VertexReader act like an Iterator / factory for BasicVertex instances without typing it by the type of the thing it's to instantiate?
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Mannix updated GIRAPH-36:
------------------------------

    Attachment: GIRAPH-36.diff

New patch.  Nicer formatting, removed some more superfluous references to GraphState
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Claudio Martella (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130548#comment-13130548 ] 

Claudio Martella commented on GIRAPH-36:
----------------------------------------

I understand. well, although I don't see how it could be really optimized on the fact a class doesn't present a subset of the methods (the mutating methods) on performance, I could understand on a principle.

Anyway I think it could make sense to follow your distinction between mutation before and after instantiation, and I believe a good way of doing this could be:

(a) initialize the object through Writable interface readField() at instantiation. This would allow us basically to re-use code we already have (readField() implementation) and not break any underlying assumption about mutation. The bad part is that people would have to provide storage on DataInputStream instead of the current VertexInputFormat.

(b) explicitly create initialize() method to do the "pre instantiation" init code, which would be added to BasicVertex. This code would be called inside of VertexInputFormat.next() for instance by passing the content of RecordReader.getCurrentValue(), i.e. vertex.initialize(getRecordReader().getCurrentValue()). This is probably the cleanest solution as it doesn't require to change the current API and also quite matches with the Factory-based current creation of vertices.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Mannix updated GIRAPH-36:
------------------------------

    Attachment: GIRAPH-36.diff

This patch is gigantic, sorry.  It resolves this current ticket by simultaneously resolving GIRAPH-28 as a proof-of-concept: 

Introduces a primitive-only BasicVertex subclass which is *not* a MutableVertex, let alone a subclass of Vertex.  In particular, to verify nontriviality, it actually replaces the SimplePageRankVertex with a subclass of LongDoubleFloatDoubleVertex (the aforemetioned example primitive vertex).

This required some extensive refactoring, all over the place.  Unit tests currently pass on my box, but I doubt that this patch is really ready for prime-time.

Try it out!

(Note: code style is probably also a mess, again, sorry.  Couldn't find a checksyle set up anywhere with the new guidelines... and it looks like most of the old code still uses the old whitespace rules anyways...)
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142221#comment-13142221 ] 

Hudson commented on GIRAPH-36:
------------------------------

Integrated in Giraph-trunk-Commit #22 (See [https://builds.apache.org/job/Giraph-trunk-Commit/22/])
    GIRAPH-36: Ensure that subclassing BasicVertex is possible by user
  apps. (jake.mannix via aching)

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1196639
Files : 
* /incubator/giraph/trunk/CHANGELOG
* /incubator/giraph/trunk/pom.xml
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedService.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/VertexList.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleShortestPathsVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Edge.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexReader.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/AdjacencyListTextVertexOutputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/AdjacencyListVertexReader.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/JsonBase64VertexInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/LongDoubleDoubleAdjacencyListVertexInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/SequenceFileVertexInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/TextDoubleDoubleAdjacencyListVertexInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/TextVertexInputFormat.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperManager.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/TestBspBasic.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/lib/TestAdjacencyListTextVertexOutputFormat.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/lib/TestLongDoubleDoubleAdjacencyListVertexInputFormat.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/lib/TestTextDoubleDoubleAdjacencyListVertexInputFormat.java

                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff.warnings
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139979#comment-13139979 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

s/bit/big/
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139923#comment-13139923 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Hey Avery,

  Glad you could get it to all work (unit-test-wise) for you - did you run the unit tests talking to a real cluster, or just localhost?  I haven't tried the former yet...

  Regarding the comments: I'm fine keeping GraphState hidden to the implementation details, but will it always work in practice?  I mean, all of the state in there used to be global static state in Vertex, and all subclasses had full read/write access to it.  We've segregated it in a non-static object which we're treating as a singleton (but aren't forcing it to be one programmaticly), but can we really hide it?

  For example, BasicVertex#sendMsg() requires the WorkerCommunications, which it gets from the GraphMapper, which it gets from the GraphState.  Do you think we should implement all the things we think the user will need from the GraphMapper directly *in* BasicVertex (or MutableVertex), and remove the getGraphState() method entirely, or just keep it package private for now. 

  In general, I like keeping impl details away from developers too, but this early in the project, we also shouldn't hamstring ourselves by thinking there is a difference from a "casual user" of the system, and someone willing to hack around a bit. :)

  2) If VertexReader is going to return a BasicVertex<I,V,E,M> from getCurrentVertex() to callers who really need it to be fully specified nice and type-safely, what choice do we have?  From a pratical, rather than API standpoint, is it possible we'd someday want to be able to read up a collection of vertices with initial messages ready to be processed?  Not sure if that matters.

  3) I tried, and can clean up as possible, but I really rely on my IDE settings for that.  Not being able to set autoformat on means I'm pretty doomed, but if you see places where I can clean up manually, I can do that, sure.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Claudio Martella (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126432#comment-13126432 ] 

Claudio Martella commented on GIRAPH-36:
----------------------------------------

Hey Jake, did you get any chance to fix this one lately? 

This is quite blocking for me, I'm based on BasicVertex. Can I help out somehow?
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141965#comment-13141965 ] 

Avery Ching commented on GIRAPH-36:
-----------------------------------

+1.
Jake, just re-ran with your new patch.  Passed both local and MR unittests.  I noticed some warnings and tried to address them (some aren't your fault, from an earlier patch).  I'm uploading a diff of your diff.  Take a look and if it's fine I'll commit on your behalf.  I didn't address the warnings in src/test/ava/org/apache/giraph/lib with mockito since I don't understand it as well as I should, but we should knock out those warnings when we can.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141553#comment-13141553 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Ok, got the setGraphState() stuff out, tests pass, with a little messing around.

So formatting:  I tracked down all of the stray "if(" bits, but do you know of any easy way to find files which have mixed spacing?  I couldn't think of a good regex? :P

Also, regarding committing: I never heard back from you or Owen about getting myself actually getting commit privileges.  I never heard from infra@, etc.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140988#comment-13140988 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

setGraphState is called in GraphMapper, yes, but there are other places vertices get instantiated: 

  MutableVertex#instantiateVertex()
  VertexResolver#instantiateVertex()

In both of these classes, we have a handle on GraphState, so I guess we can just set it right after.  I'll try that and see if there are any issues.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141573#comment-13141573 ] 

Avery Ching commented on GIRAPH-36:
-----------------------------------

Since this is a somewhat large change, can you also run the tests on a hadoop instance (local is fine).  I.e. 'mvn test -Dprop.mapred.job.tracker=localhost:50300'.  I'll take a look at your diff tonight.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140279#comment-13140279 ] 

Avery Ching commented on GIRAPH-36:
-----------------------------------

Jake,

Your suggestion on 1) will work, but couldn't we also just use the infrastructure to set graph state after getCurrentVertex()?  There is no need to use the GraphState state at that point of the application.  It potentially will prevent users from making any mistakes and complicating GraphState, which is owned by the Worker.

Per 2), this is a downside of changing the interface I think.  I guess we will have to live with the additional M type.  It's not a big deal to me.  That being said, we should prevent users from initializing vertices with messages from the vertex reader since I don't think that makes any sense...or does it?
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130028#comment-13130028 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Hey Claudio,

  The reason why this has been slow going is exactly your last line: MutableVertex is a bit of a beast - it's essentially assumed that all vertex implementations are a subclass of Vertex extends MutableVertex, which is the whole point of this ticket.  The readField() method and the VertexInputFormats can be modified, and that's what I've been working on: they essentially are designed currently to pretend you're reading from raw Text and calling setters on the (Mutable)Vertex you've instantiated.  But the more sensible thing is to have the BasicVertex implement Writable, which lets it read in data in BspServiceWorker, but also implement an interface which lets it read lines of Text, say, and initialize that way, if it's not a MutableVertex.  

  Its a little ugly, maybe there are better ways.

  You were suggesting just force everyone to be MutableVertex?  I think the idea of having a distinction is good - there may be optimizations possible in the case where you know your graph will never change in structure after instantiation (a very common case, for most graph algorithms).  The trick is the "after instantiation" part.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139082#comment-13139082 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

Ok, I'm digging into this again, finally... man patches get pretty stale if left unattended in the Incubator. :)

So, the idea with keeping BasicVertex which is *not* mutable is not for performance, but for implementation ease / coding safety: forcing users to implement mutating methods on their graph which doesn't allow such things is unfair to the users of this library, and having them throw UnsupportedOperationException should be avoided because then lots of problems are only seen at runtime, and then when the project moves forward and adds some innocuous thing which passes all unit tests, but ends up calling some mutator method that wasn't called before (maybe as part of a "post-initialization check of correctness, or something"), the users who are throwing exceptions in these methods only find out when they run their jobs which used to succeed, and now die with mysterious UOE getting thrown in new methods nowhere near their code.  Bad news, if it can be avoided.

So I'm in favor of a) over b), in theory.  It's closest to the way things are done in hadoop, and seems the cleanest, from a purity of API standpoint.

However, having dug into the code a bit, it's going to be hairy to get a) to work easily, now that there is a proliferation of VertexInputFormat stuff that Jakob has been committing lately (yay! but boo. :( ) - all of this code would also need to be retrofitted to match.

So I'm going to take a stab at b).  It breaks the API least, and while it appears to allow mutation, it's pretty clear to the user that it's not really, and we can forcibly throw exceptions on multiple use, for example, to keep it from being reused.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Mannix updated GIRAPH-36:
------------------------------

    Attachment: GIRAPH-36.diff

updated with formatting changes and hiding GraphState
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Claudio Martella (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130549#comment-13130549 ] 

Claudio Martella commented on GIRAPH-36:
----------------------------------------

of course i meant either (a) or (b) :)
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142615#comment-13142615 ] 

Jakob Homan commented on GIRAPH-36:
-----------------------------------

bq. Can GIRAPH-47 be a start for what you have in mind?
Yeah, that's close to what I'm hoping for.  I'll comment there. 

bq. I can certainly imagine this being very nice and clean.
Inheritance is going to be brittle over the long term, so the cleaner we can make this now, the easier it will be to attract a user base confident in investing in the new platform.  This is particularly true given how much we're at the mercy of Java's poor generic system; users already have to grok things like DoubleDoubleFloat in type names.
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff.warnings
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107282#comment-13107282 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

In fact, thinking about VertexReader further, it seems its entire API is a little backwards.  Why are we *passing in* instantiated Vertices, and filling them in?  Shouldn't they effectively be "iterators" over the InputSplit?

> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Claudio Martella (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129777#comment-13129777 ] 

Claudio Martella commented on GIRAPH-36:
----------------------------------------

I've been working on contributing to this patch. I basically substituted everywhere the system expects Vertex a BasicVertex. This does work until I reach BspServiceWorker.loadVertexRange() where vertex.readField() is used to restore a checkpoint. The problem is that Writable is implemented by MutableVertex instead of BasicVertex. This problem is also present in VertexInputFormats where we would call mutating methods while creating the vertex, so we cannot really rely on BasicVertex. If we use MutableVertex on that API, then we require the user apps to extends MutableVertex.

So I guess the idea of this refactor is either to switch to MutableVertex, create a Vertex that is not finalized or just unify the whole thing.

What do you think?


                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142589#comment-13142589 ] 

Jake Mannix commented on GIRAPH-36:
-----------------------------------

It's tricky, I think.  The original Vertex class was exactly meant to hide (via private) some of the implementation details of state, yet leave userland APIs which the user is meant to implement.  But then it turned out that the most generic way to hold onto the destination vertices and edge weights is not always the most efficient (hence: primitive-specific subclasses).

But maybe you're just saying that we should pull out most of this "implementation specific" state into other objects, decompose Vertex a bit, and have users be able to pluggably implement the parts which need to be made special, and leave as generic those which are generic?

Some stuff we already have done: GraphState encapsulates all global state (current superstep, number of global edges + vertices, etc).  Local state could be similarly pulled off into a couple of different data structures (edges with weights, messages).

I can certainly imagine this being very nice and clean.

Open a ticket and describe some thoughts on how it could look in practice?  Sounds like it could be another pretty significant refactoring, so let's do it sooner rather than later!
                
> Ensure that subclassing BasicVertex is possible by user apps
> ------------------------------------------------------------
>
>                 Key: GIRAPH-36
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-36
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.70.0
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Blocker
>             Fix For: 0.70.0
>
>         Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff.warnings
>
>
> Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex).  Classes which wish to have application specific data structures (ie. not a TreeMap<I, Edge<I,E>>) may need to extend either MutableVertex or BasicVertex.  Unfortunately VertexRange extends ArrayList<Vertex>, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira