You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@gossip.apache.org by Русак Максим <ma...@yandex.ru> on 2017/02/08 18:06:49 UTC

Gossip on Google Summer of Code

Hello,

I'm a student at the Moscow Institute of Physics and Technology. I'm currently working on my undergrad diploma in Yandex YT (analog of Hadoop). My work is related to distributed metadata and anomalous node detection, so it's interesting for me to dive deep into distributed metadata mechanisms like Gossip.
After discussing possible tasks with Edward Capriolo, he added GOSSIP-51 and GOSSIP-53 to the ASF GSoC2017 list.
Now we need to add detailed description to these tasks or propose new issues. Good issues have to be big enough and not very urgent, so that they can be completed during summer. 
I'll be glad to hear your opinions.

Regards,
Maxim Rusak.

Re: Gossip on Google Summer of Code

Posted by Edward Capriolo <ed...@gmail.com>.

On Wed, Feb 8, 2017 at 1:40 PM, chandresh pancholi <
chandreshpancholi007@gmail.com> wrote:

> I completely agree with P. We should have features in GSOC which students
> can work for 2-3 month.
>
> Thanks
>
> On Wed, Feb 8, 2017 at 11:36 PM, Русак Максим <ma...@yandex.ru> wrote:
>
> > Hello,
> >
> > I'm a student at the Moscow Institute of Physics and Technology. I'm
> > currently working on my undergrad diploma in Yandex YT (analog of
> Hadoop).
> > My work is related to distributed metadata and anomalous node detection,
> so
> > it's interesting for me to dive deep into distributed metadata mechanisms
> > like Gossip.
> > After discussing possible tasks with Edward Capriolo, he added GOSSIP-51
> > and GOSSIP-53 to the ASF GSoC2017 list.
> > Now we need to add detailed description to these tasks or propose new
> > issues. Good issues have to be big enough and not very urgent, so that
> they
> > can be completed during summer.
> > I'll be glad to hear your opinions.
> >
> > Regards,
> > Maxim Rusak.
> >
>
>
>
> --
> Chandresh Pancholi
> Senior Software Engineer
> Flipkart.com
> Email-id:chandresh.pancholi@flipkart.com
> Contact:08951803660
>

For background of GOSSIP-51. Currently we have two ActiveGossipp
subclasses.

The first picks a random node and sends the entire topology:
https://github.com/apache/incubator-gossip/blob/master/src/main/java/org/apache/gossip/manager/SimpleActiveGossipper.java

The second dividers the cluster into racks/ and datacenters and gossips at
different rates between those groups:
https://github.com/apache/incubator-gossip/blob/master/src/main/java/org/apache/gossip/manager/DatacenterRackAwareActiveGossiper.java

The flaw with both of these designs is that the send the entire list of
nodes.

The challenge with large cluster sizes is lots of cross talk and the list
becoming very large.

On list someone mentioned the swim protocol. The proposed win of that is
that is the protocol does not have to communicate the entire list and that
it elects peers.

End run Gossip could house one or more algos. Maybe some are more chatty
but better for small clusters etc.

For background of GOSSIP-53: Currently we can Gossip data into two
hashmaps, per_node and shared. Per node is keyed by the node_id and shared
is keyed by the key.

Examples are here:
https://github.com/apache/incubator-gossip/blob/master/src/test/java/org/apache/gossip/DataTest.java

The issue here is that we can only overwrite the entire object.

A CRDT is a data structure that is -merge-able. A simple example is a list
that always grows commonly called a 'gset'.
To implement this could for example detect that some Objects in our hashmap
are instances of CRDT and merge them instead of replacing them.

Of course gset is the easiest one :) There are other more complex CRDTs
that support deletion etc. On the far end there are nosql db's like riak
that use vector clockers. I am unsure if that is generalization to
something that can work in a fully peer-to-peer network. Essentially with
53 it is less clear the end direction, but we have discussed recipes like
ephemeral nodes, locks, leader elections. Gossip may not be the best place
to implement a low latency lock/ leader election but if we can do that that
may open up some use cases where someone might avoid a Database or
Zookeeper and instead use Gossip in their application.

The swim stuff will be there for the summer. I am happy with the current
accrual system now, The CRDT, I can be honest, (I am kinda hot on them now)
It will be started before the summer, although I can imagine things like
that are endless work, IE maybe someone has some high tech fancy CRDT, etc

Re: Gossip on Google Summer of Code

Posted by chandresh pancholi <ch...@gmail.com>.

I completely agree with P. We should have features in GSOC which students
can work for 2-3 month.

Thanks

On Wed, Feb 8, 2017 at 11:36 PM, Русак Максим <ma...@yandex.ru> wrote:

> Hello,
>
> I'm a student at the Moscow Institute of Physics and Technology. I'm
> currently working on my undergrad diploma in Yandex YT (analog of Hadoop).
> My work is related to distributed metadata and anomalous node detection, so
> it's interesting for me to dive deep into distributed metadata mechanisms
> like Gossip.
> After discussing possible tasks with Edward Capriolo, he added GOSSIP-51
> and GOSSIP-53 to the ASF GSoC2017 list.
> Now we need to add detailed description to these tasks or propose new
> issues. Good issues have to be big enough and not very urgent, so that they
> can be completed during summer.
> I'll be glad to hear your opinions.
>
> Regards,
> Maxim Rusak.
>



-- 
Chandresh Pancholi
Senior Software Engineer
Flipkart.com
Email-id:chandresh.pancholi@flipkart.com
Contact:08951803660