You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Adam Gilmore <dr...@gmail.com> on 2015/04/22 07:04:24 UTC

New Drillbits joining cluster causes severe performance spike

Hey guys,

I'm troubleshooting some issues with our cluster under some production load
and scaling.

If we add new drillbits to a cluster, as soon as it joins the cluster,
performance degrades severely (queries that usually take 1s would take 60s,
for example).  After a few minutes, it recovers just fine and all is normal
again.

What I assume is happening is that the new drillbit is still initializing
or "warming up" but already made itself available to start taking work.
This means that queries would end up waiting for this drillbit to
initialize before the query returns.

I haven't confirmed this in the profiles as yet (as we have a fair bit of
load so I haven't isolated the individual long-running queries), but I'll
keep investigating.

In the mean time, does that theory sound possible?  And if so, what
initialization/warm up is the drillbit doing?  Furthermore, could we not
delay it joining the cluster for active work until it is completely ready
to undergo the work?

We're considering running some sort of autoscaling to handle varying load,
so this would be really crucial for us!

Any thoughts or pointing me in the right direction would be great.

Re: New Drillbits joining cluster causes severe performance spike

Posted by Ted Dunning <te...@gmail.com>.
Adam,

There has been some auto-scaling experimentation done outside the list in
which drillbits stay alive, but don't accept work and don't allocate memory
until they are needed.

That avoids startup transients for the most part.  This scaling work is
still quite immature, but I will encourage those doing the experiments to
report interim results to the list.


On Wed, Apr 22, 2015 at 1:04 AM, Adam Gilmore <dr...@gmail.com> wrote:

> Hey guys,
>
> I'm troubleshooting some issues with our cluster under some production load
> and scaling.
>
> If we add new drillbits to a cluster, as soon as it joins the cluster,
> performance degrades severely (queries that usually take 1s would take 60s,
> for example).  After a few minutes, it recovers just fine and all is normal
> again.
>
> What I assume is happening is that the new drillbit is still initializing
> or "warming up" but already made itself available to start taking work.
> This means that queries would end up waiting for this drillbit to
> initialize before the query returns.
>
> I haven't confirmed this in the profiles as yet (as we have a fair bit of
> load so I haven't isolated the individual long-running queries), but I'll
> keep investigating.
>
> In the mean time, does that theory sound possible?  And if so, what
> initialization/warm up is the drillbit doing?  Furthermore, could we not
> delay it joining the cluster for active work until it is completely ready
> to undergo the work?
>
> We're considering running some sort of autoscaling to handle varying load,
> so this would be really crucial for us!
>
> Any thoughts or pointing me in the right direction would be great.
>