You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Tushar Dhadiwal <tu...@gmail.com> on 2019/11/14 02:32:34 UTC
Re: Accumulo on Azure - Long Term Monitoring

Hello Gang,

I took a stab at adding initial code for a accumulo availability monitor
which is doing scans of random values across various tablet servers and
capturing timing information related to how long such scans take. Here is
the PR for the Same : https://github.com/apache/accumulo-testing/pull/118

The timings received from this probe were piped using a very lightweight
log4j appender to a Log Analytics service, where by querying logs and
plotting the graph of read timings vs time, I was able to determine changes
in accumulo cluster availability (For e.g. tablet servers down, poor
network connectivity etc.). I can provide more info and share code related
to this if anyone is interested in it.

Would appreciate your thoughts and feedback on this.

Cheers,
Tushar Dhadiwal


On Thu, Oct 24, 2019 at 5:09 PM Tushar Dhadiwal <tu...@gmail.com>
wrote:

> Hello Everyone,
>
>
> I am a Software Engineer at Microsoft and our team is currently working
> on making the deployment and operations of Accumulo on Azure as seamless as
> possible. As part of this effort, we are attempting to observe / measure
> some standard Accumulo operations (e.g. scan, canary queries, ingest, etc.)
> and how their performance varies over time on long standing Accumulo
> clusters running in Azure. As part of this we’re looking to come up with a
> metric that we can use to evaluate how healthy / available an Accumulo
> cluster is. Over time we intend to use this to understand how underlying
> platform changes in Azure can affect overall health of Accumulo workloads.
>
>
>
> As a starting metric for example, we are thinking of continually doing
> scans of random values across various tablet servers and capturing timing
> information related to how long such scans take. I took a quick look at
> the accumulo-testing repo and didn’t find any tests or probes attempting to
> do something along these lines. Does something like this seem
> reasonable? Has anyone previously attempted something similar? Does
> accumulo-testing seem like a reasonable place for code that attempts to do
> something like this?
>
>
>
> Appreciate your thoughts and feedback.
>
>
>
> Cheers,
>
> Tushar Dhadiwal
>