You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by Eli Reisman <in...@gmail.com> on 2012/09/01 01:25:58 UTC

Re: [jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

With 2000 workers, thats 2000 extra connections in the system. We run
Giraph/Netty on the same cluster as existing jobs that use Hadoop RPC so
network resources are sometimes at a premium. These jobs are often running
on the same boxes as our worker mappers are running, and the scheduling is
not under our control or particularly suited to Giraph. I'm not too
familiar with the aggregator code but it seems like if you have an idea for
an implementation that doesn't use a barrier, I agree with Avery that this
doesn't preclude the tree option in that scenario either.

On the other hand, if you have a specialized use case, maybe the easiest
thing would be to do what it takes to make your map aggregator work however
you like and have it be command-line optional, and just leave the existing
ZK implementation in place for the rest of the use cases. Have you had
problems with needing more standard aggregators and ZK nodes not holding
enough data, or is this map aggregator driving your need for this feature?
Can I ask what algorithm you're implementing that requires a globally
aggregated map at every superstep? Have you guys noticed performance or
speed issues with the existing ZK implementation as you add aggregators to
an application?

Anyway I'm not firmly for or against any of this stuff, just curious. If
you find an implementation that works for you that sounds great. If it was
optional with the existing version or the tree available, that would
probably save us some headache here when we share a cluster (which is
almost all the time.)

On Fri, Aug 31, 2012 at 12:55 PM, Maja Kabiljo (JIRA) <ji...@apache.org>wrote:

>
>     [
> https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446301#comment-13446301]
>
> Maja Kabiljo commented on GIRAPH-273:
> -------------------------------------
>
> That's true, we can implement several different approaches and decide
> which one to use based on the current application needs.
>
> > Aggregators shouldn't use Zookeeper
> > -----------------------------------
> >
> >                 Key: GIRAPH-273
> >                 URL: https://issues.apache.org/jira/browse/GIRAPH-273
> >             Project: Giraph
> >          Issue Type: Improvement
> >            Reporter: Maja Kabiljo
> >            Assignee: Maja Kabiljo
> >
> > We use Zookeeper znodes to transfer aggregated values from workers to
> master and back. Zookeeper is supposed to be used for coordination, and it
> also has a memory limit which prevents users from having aggregators with
> large value objects. These are the reasons why we should implement
> aggregators gathering and distribution in a different way.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Re: [jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

Posted by Eli Reisman <in...@gmail.com>.

Hey if your application is very specific and internal then nevermind I
don't need to know ;) I am not entirely familiar with the aggregator code
but if it is placed in the superstep cycle currently in a way that avoids a
barrier, that sounds terrific. If the ZK is just too small then some kind
of networking solution sounds better, especially if you've already taken
the time to write it!

We are in a situation where not only are some of our worker tasks on the
same box, but they are sharing resources with other mappers and tasks from
other kinds of MR jobs all the time, so there's just a lot of resources in
use per box that any Giraph user here will have 0 control over, and Giraph
is not very tolerant to mid-job changes in activity levels of the cluster
as of now. We also plan on scaling to as many workers as possible to
parallelize the work and spread the load. So for all these reasons any way
to avoid extra connections when there's no down side seems good to me.

It isn't the end of the world to make all the new connections, but it
didn't seem needed to make aggregators work as I understood the reason for
aggregators. Thats why I asked about the nature of the algorithm you guys
are implementing.

I guess what it all boils down to is what is the purpose of an aggregator.
The tree and ZK are no good if you're sending large amounts of data. I was
under the assumption this feature was a "reducer" in that you aggregate a
single result from many data points during the compute cycle, and pass it
up the chain until all results from all workers are aggregated to single
value at the master for global consumption. If you need to pass a map
through this system, maybe that data needs to be reduced at the
worker.compute() level before traversing the network, or maybe the
aggregator system is not ideal for use that case?

If its the only way to make your application work, then maybe what we're
doing is expanding the contract of what an aggregator is into a shared
global data store at the master? I think thats where I was confused about
needing an all-workers-to-master back channel just for aggregators. If
there's no way to use reduction-style aggregation to make the algorithm
work then it sounds like this needs to be done.

On Sat, Sep 1, 2012 at 12:22 AM, Maja Kabiljo <ma...@fb.com> wrote:

> In the case you mentioned you already have a million connections, that's
> why I don't see how 2k of them make a difference. Maybe I'm missing
> something here.
>
> The reason why this can be done without additional barrier is that
> aggregated values which we receive from other workers can be treated in
> the same way we treat values given in vertex.compute - we can just
> aggregate them right away. Should be doable with the tree approach also -
> we can send the values as soon as we are done with the computation and we
> received values from our child in the tree, if any.
>
> I guess we can also leave current implementation as one of the options,
> that didn't occur to me, thanks. Since aggregators are written to the same
> znode as some other data, that should be the least possible overhead for
> cases with just a few simple value aggregators.
>
> I'm not sure is the performance affected when a bit of aggregators are
> added (another guy in the team is working on the application), but I don't
> think we can get that far to notice it because of ZooKeeper memory limit.
> Avery, can you take the question about our application? (I'm not sure what
> are we allowed to share publicly what not :-))
>
>
>
> On 9/1/12 12:25 AM, "Eli Reisman" <in...@gmail.com> wrote:
>
> >With 2000 workers, thats 2000 extra connections in the system. We run
> >Giraph/Netty on the same cluster as existing jobs that use Hadoop RPC so
> >network resources are sometimes at a premium. These jobs are often running
> >on the same boxes as our worker mappers are running, and the scheduling is
> >not under our control or particularly suited to Giraph. I'm not too
> >familiar with the aggregator code but it seems like if you have an idea
> >for
> >an implementation that doesn't use a barrier, I agree with Avery that this
> >doesn't preclude the tree option in that scenario either.
> >
> >On the other hand, if you have a specialized use case, maybe the easiest
> >thing would be to do what it takes to make your map aggregator work
> >however
> >you like and have it be command-line optional, and just leave the existing
> >ZK implementation in place for the rest of the use cases. Have you had
> >problems with needing more standard aggregators and ZK nodes not holding
> >enough data, or is this map aggregator driving your need for this feature?
> >Can I ask what algorithm you're implementing that requires a globally
> >aggregated map at every superstep? Have you guys noticed performance or
> >speed issues with the existing ZK implementation as you add aggregators to
> >an application?
> >
> >Anyway I'm not firmly for or against any of this stuff, just curious. If
> >you find an implementation that works for you that sounds great. If it was
> >optional with the existing version or the tree available, that would
> >probably save us some headache here when we share a cluster (which is
> >almost all the time.)
> >
> >On Fri, Aug 31, 2012 at 12:55 PM, Maja Kabiljo (JIRA)
> ><ji...@apache.org>wrote:
> >
> >>
> >>     [
> >>
> >>https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira
> .
> >>plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446301#c
> >>omment-13446301]
> >>
> >> Maja Kabiljo commented on GIRAPH-273:
> >> -------------------------------------
> >>
> >> That's true, we can implement several different approaches and decide
> >> which one to use based on the current application needs.
> >>
> >> > Aggregators shouldn't use Zookeeper
> >> > -----------------------------------
> >> >
> >> >                 Key: GIRAPH-273
> >> >                 URL: https://issues.apache.org/jira/browse/GIRAPH-273
> >> >             Project: Giraph
> >> >          Issue Type: Improvement
> >> >            Reporter: Maja Kabiljo
> >> >            Assignee: Maja Kabiljo
> >> >
> >> > We use Zookeeper znodes to transfer aggregated values from workers to
> >> master and back. Zookeeper is supposed to be used for coordination, and
> >>it
> >> also has a memory limit which prevents users from having aggregators
> >>with
> >> large value objects. These are the reasons why we should implement
> >> aggregators gathering and distribution in a different way.
> >>
> >> --
> >> This message is automatically generated by JIRA.
> >> If you think it was sent incorrectly, please contact your JIRA
> >> administrators
> >> For more information on JIRA, see:
> >>http://www.atlassian.com/software/jira
> >>
>
>

Re: [jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

Posted by Maja Kabiljo <ma...@fb.com>.

In the case you mentioned you already have a million connections, that's
why I don't see how 2k of them make a difference. Maybe I'm missing
something here.

The reason why this can be done without additional barrier is that
aggregated values which we receive from other workers can be treated in
the same way we treat values given in vertex.compute - we can just
aggregate them right away. Should be doable with the tree approach also -
we can send the values as soon as we are done with the computation and we
received values from our child in the tree, if any.

I guess we can also leave current implementation as one of the options,
that didn't occur to me, thanks. Since aggregators are written to the same
znode as some other data, that should be the least possible overhead for
cases with just a few simple value aggregators.

I'm not sure is the performance affected when a bit of aggregators are
added (another guy in the team is working on the application), but I don't
think we can get that far to notice it because of ZooKeeper memory limit.
Avery, can you take the question about our application? (I'm not sure what
are we allowed to share publicly what not :-))



On 9/1/12 12:25 AM, "Eli Reisman" <in...@gmail.com> wrote:

>With 2000 workers, thats 2000 extra connections in the system. We run
>Giraph/Netty on the same cluster as existing jobs that use Hadoop RPC so
>network resources are sometimes at a premium. These jobs are often running
>on the same boxes as our worker mappers are running, and the scheduling is
>not under our control or particularly suited to Giraph. I'm not too
>familiar with the aggregator code but it seems like if you have an idea
>for
>an implementation that doesn't use a barrier, I agree with Avery that this
>doesn't preclude the tree option in that scenario either.
>
>On the other hand, if you have a specialized use case, maybe the easiest
>thing would be to do what it takes to make your map aggregator work
>however
>you like and have it be command-line optional, and just leave the existing
>ZK implementation in place for the rest of the use cases. Have you had
>problems with needing more standard aggregators and ZK nodes not holding
>enough data, or is this map aggregator driving your need for this feature?
>Can I ask what algorithm you're implementing that requires a globally
>aggregated map at every superstep? Have you guys noticed performance or
>speed issues with the existing ZK implementation as you add aggregators to
>an application?
>
>Anyway I'm not firmly for or against any of this stuff, just curious. If
>you find an implementation that works for you that sounds great. If it was
>optional with the existing version or the tree available, that would
>probably save us some headache here when we share a cluster (which is
>almost all the time.)
>
>On Fri, Aug 31, 2012 at 12:55 PM, Maja Kabiljo (JIRA)
><ji...@apache.org>wrote:
>
>>
>>     [
>> 
>>https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.
>>plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446301#c
>>omment-13446301]
>>
>> Maja Kabiljo commented on GIRAPH-273:
>> -------------------------------------
>>
>> That's true, we can implement several different approaches and decide
>> which one to use based on the current application needs.
>>
>> > Aggregators shouldn't use Zookeeper
>> > -----------------------------------
>> >
>> >                 Key: GIRAPH-273
>> >                 URL: https://issues.apache.org/jira/browse/GIRAPH-273
>> >             Project: Giraph
>> >          Issue Type: Improvement
>> >            Reporter: Maja Kabiljo
>> >            Assignee: Maja Kabiljo
>> >
>> > We use Zookeeper znodes to transfer aggregated values from workers to
>> master and back. Zookeeper is supposed to be used for coordination, and
>>it
>> also has a memory limit which prevents users from having aggregators
>>with
>> large value objects. These are the reasons why we should implement
>> aggregators gathering and distribution in a different way.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see:
>>http://www.atlassian.com/software/jira
>>