You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@gossip.apache.org by chandresh pancholi <ch...@gmail.com> on 2016/10/11 18:04:54 UTC

[GOSSIP-17] https://issues.apache.org/jira/browse/GOSSIP-17

Hi,

I wanted to know where to begin working on this issue.
Someone please help me out with where to start and how to proceed with it.

For Histogram i see ActiveThreadGroup and PassiveThreadGroup are doing
inter-node operation.

Where are we tracking success and failure request so generate meter metrics?

Any kind of help is appreciable.

-- 
Chandresh Pancholi
Senior Software Engineer
Flipkart.com
Email-id:chandresh.pancholi@flipkart.com
Contact:08951803660

Re: [GOSSIP-17] https://issues.apache.org/jira/browse/GOSSIP-17

Posted by chandresh pancholi <ch...@gmail.com>.

Hi,

I have added Histogram metric in ActiveGossip for sendSharedData,
sendPerNodeData, sendMembership.

Link : https://github.com/chandresh-pancholi/incubator-gossip

Could someone checkout the code and review it before I finish it for other
metrics ? I want to know whether I am moving in the right direction with
the code.





On Tue, Oct 11, 2016 at 11:50 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> I would say a few things. There are a lot of things going on in the
> software that are interesting.
>
> We have several queues and thread pools.
>
> It makes sense to put
> http://metrics.dropwizard.io/3.1.0/getting-started/#gauges around those.
> This will give us visibility as to how close those are to 0 at any given
> time.
>
> We now have per-node data:
>
> https://issues.apache.org/jira/browse/GOSSIP-21
> https://issues.apache.org/jira/browse/GOSSIP-25
>
> It makes sense to use gauges to record the size of these. We should also
> use meters to count how operations/sec are caused by users adding data as
> well as the internode process replicating data.
>
> For PassiveGossipThread I could see us counting messages received as a
> meter. We could corrupt messages separately as a meter. We could aslo
> capture this data per host:
>
> gossipfrom.node1.goodmessages
> gossipfrom.node1.badmessages
>
> As well as globally
>
> gossipfrom.badmessages
> gossipfrom.goodmessages
>
> For ActiveGossip we could use histograms to track the time to process
>
> sendSharedData
> sendPerNodeData
> sendMembership
>
> We could use a gauge to track the size of this.scheduledExecutorService =
> Executors.newScheduledThreadPool(2); and other executors tom make sure
> that
> that queue is not backing up/blocked. Again you can track this per host and
> globally
>
> I am an ex-system administrator so I am generally ok with as many metrics
> as possible as long as we do not clutter the code. There are ways to do
> aspect/annotation driven counters as well so we can always look to refactor
> around those things if we want to.
>
> If you see something that seems like a point of possible contention or
> something that you believe is important to track I would capture that. In
> the long run there is something to consider about tracking metrics from 1k
> node clusters but we are not there yet and metrics is generally lighter
> than the code anyway.
>
> Thanks for taking the time to look at this.
> Edward
>
>
>
>
>
> On Tue, Oct 11, 2016 at 2:04 PM, chandresh pancholi <
> chandreshpancholi007@gmail.com> wrote:
>
> > Hi,
> >
> > I wanted to know where to begin working on this issue.
> > Someone please help me out with where to start and how to proceed with
> it.
> >
> > For Histogram i see ActiveThreadGroup and PassiveThreadGroup are doing
> > inter-node operation.
> >
> > Where are we tracking success and failure request so generate meter
> > metrics?
> >
> > Any kind of help is appreciable.
> >
> > --
> > Chandresh Pancholi
> > Senior Software Engineer
> > Flipkart.com
> > Email-id:chandresh.pancholi@flipkart.com
> > Contact:08951803660
> >
>



-- 
Chandresh Pancholi
Senior Software Engineer
Flipkart.com
Email-id:chandresh.pancholi@flipkart.com
Contact:08951803660

Re: [GOSSIP-17] https://issues.apache.org/jira/browse/GOSSIP-17

Posted by Edward Capriolo <ed...@gmail.com>.

I would say a few things. There are a lot of things going on in the
software that are interesting.

We have several queues and thread pools.

It makes sense to put
http://metrics.dropwizard.io/3.1.0/getting-started/#gauges around those.
This will give us visibility as to how close those are to 0 at any given
time.

We now have per-node data:

https://issues.apache.org/jira/browse/GOSSIP-21
https://issues.apache.org/jira/browse/GOSSIP-25

It makes sense to use gauges to record the size of these. We should also
use meters to count how operations/sec are caused by users adding data as
well as the internode process replicating data.

For PassiveGossipThread I could see us counting messages received as a
meter. We could corrupt messages separately as a meter. We could aslo
capture this data per host:

gossipfrom.node1.goodmessages
gossipfrom.node1.badmessages

As well as globally

gossipfrom.badmessages
gossipfrom.goodmessages

For ActiveGossip we could use histograms to track the time to process

sendSharedData
sendPerNodeData
sendMembership

We could use a gauge to track the size of this.scheduledExecutorService =
Executors.newScheduledThreadPool(2); and other executors tom make sure that
that queue is not backing up/blocked. Again you can track this per host and
globally

I am an ex-system administrator so I am generally ok with as many metrics
as possible as long as we do not clutter the code. There are ways to do
aspect/annotation driven counters as well so we can always look to refactor
around those things if we want to.

If you see something that seems like a point of possible contention or
something that you believe is important to track I would capture that. In
the long run there is something to consider about tracking metrics from 1k
node clusters but we are not there yet and metrics is generally lighter
than the code anyway.

Thanks for taking the time to look at this.
Edward

On Tue, Oct 11, 2016 at 2:04 PM, chandresh pancholi <
chandreshpancholi007@gmail.com> wrote:

> Hi,
>
> I wanted to know where to begin working on this issue.
> Someone please help me out with where to start and how to proceed with it.
>
> For Histogram i see ActiveThreadGroup and PassiveThreadGroup are doing
> inter-node operation.
>
> Where are we tracking success and failure request so generate meter
> metrics?
>
> Any kind of help is appreciable.
>
> --
> Chandresh Pancholi
> Senior Software Engineer
> Flipkart.com
> Email-id:chandresh.pancholi@flipkart.com
> Contact:08951803660
>