You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Janne Keskitalo <ja...@paf.com> on 2017/11/01 08:16:39 UTC

Kudu background tasks

Hi

Our Kudu test environment got unresponsive yesterday for unknown reason. It
has three tablet servers and one master. It's running in AWS on quite small
host machines, so maybe some node ran out of memory or something. It has
happened before with this setup. Anyway, after we restarted kudu service,
we couldn't do any selects. From the tablet server UI I could see it was
initializing and bootstrapping tablets. It took many hours until all
tablets were in RUNNING-state.

My question is where can I find information about these background
operations? I want to understand what happens in situations when some node
is offline and then comes back up after a while. What is tablet
initialization and bootstrapping, etc.

-- 
Br.
Janne Keskitalo,
Database Architect, PAF.COM
For support: dbdsupport@paf.com

Re: Kudu background tasks

Posted by Janne Keskitalo <ja...@paf.com>.
Hi

We're running version: kudu 1.5.0-cdh5.13.0

We had another incident today due to memory runnig out and kudu is now
coming back up slowly. I took a screenshot of kudu tablet server ui and
would like to know what actual happens here? I can see more tablets slowly
getting to "RUNNING"-state.


​

2017-11-01 22:01 GMT+01:00 Todd Lipcon <to...@cloudera.com>:

> Hi Janne,
>
> It's not clear whether the issue was that it was taking a long time to
> restart (i.e replaying WALs) or if somehow you also ended up having to
> re-replicate a bunch of tablets from host to host in the cluster. There
> were some bugs in earlier versions of Kudu (eg KUDU-2125, KUDU-2020) which
> could make this process rather slow to stabilize.
>
> If this issue happens again, running 'kudu cluster ksck' during the
> instable period can often yield more information to help understand what is
> happening.
>
> What version are you running?
>
> Todd
>
>
> On Wed, Nov 1, 2017 at 1:16 AM, Janne Keskitalo <ja...@paf.com>
> wrote:
>
>> Hi
>>
>> Our Kudu test environment got unresponsive yesterday for unknown reason.
>> It has three tablet servers and one master. It's running in AWS on quite
>> small host machines, so maybe some node ran out of memory or something. It
>> has happened before with this setup. Anyway, after we restarted kudu
>> service, we couldn't do any selects. From the tablet server UI I could see
>> it was initializing and bootstrapping tablets. It took many hours until all
>> tablets were in RUNNING-state.
>>
>> My question is where can I find information about these background
>> operations? I want to understand what happens in situations when some node
>> is offline and then comes back up after a while. What is tablet
>> initialization and bootstrapping, etc.
>>
>> --
>> Br.
>> Janne Keskitalo,
>> Database Architect, PAF.COM
>> For support: dbdsupport@paf.com
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Br.
Janne Keskitalo,

Re: Kudu background tasks

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Janne,

It's not clear whether the issue was that it was taking a long time to
restart (i.e replaying WALs) or if somehow you also ended up having to
re-replicate a bunch of tablets from host to host in the cluster. There
were some bugs in earlier versions of Kudu (eg KUDU-2125, KUDU-2020) which
could make this process rather slow to stabilize.

If this issue happens again, running 'kudu cluster ksck' during the
instable period can often yield more information to help understand what is
happening.

What version are you running?

Todd


On Wed, Nov 1, 2017 at 1:16 AM, Janne Keskitalo <ja...@paf.com>
wrote:

> Hi
>
> Our Kudu test environment got unresponsive yesterday for unknown reason.
> It has three tablet servers and one master. It's running in AWS on quite
> small host machines, so maybe some node ran out of memory or something. It
> has happened before with this setup. Anyway, after we restarted kudu
> service, we couldn't do any selects. From the tablet server UI I could see
> it was initializing and bootstrapping tablets. It took many hours until all
> tablets were in RUNNING-state.
>
> My question is where can I find information about these background
> operations? I want to understand what happens in situations when some node
> is offline and then comes back up after a while. What is tablet
> initialization and bootstrapping, etc.
>
> --
> Br.
> Janne Keskitalo,
> Database Architect, PAF.COM
> For support: dbdsupport@paf.com
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera