You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Faraz Mateen <fm...@an10.io> on 2019/10/04 11:20:18 UTC

"Too many open files" error

Hi all,

I am facing a problem with my kudu setup where tablet server crashes with
"too many open files" error.
The setup consists of a single master and a single tablet server. Tables
created are such that there are 39 partitions per table. However not all
partitions have data that corresponds to them.
Yesterday my tserver crashed and when I am trying to restart the tserver,
it fails with the error:

I1004 03:50:39.896301  5669 ts_tablet_manager.cc:1173] T
cab85f15f06748d0b59161d9f3da55f7 P ee14d248ac994d0eb60dbb0db4ab3b09:
Registered tablet (data state: TABLET_DATA_READY)
W1004 03:50:39.923184  5687 os-util.cc:165] could not read
/proc/self/status: IO error: /proc/self/status: Too many open files (error
24)
I1004 03:50:39.939460  5669 ts_tablet_manager.cc:1173] T
d8d68ce6f6ea49479c00d29709869f1f P ee14d248ac994d0eb60dbb0db4ab3b09:
Registered tablet (data state: TABLET_DATA_READY)

I have already modified ulimit of the machine:

root@vm-3:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63923
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65535
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

*Set up Details:*
Single master and tserver setup on a single VM.
4 cores, 550GB hard disk, 16GB RAM
Kudu version 1.8 on ubuntu, installed through debian packages.
Before crash, data was being inserted in kudu at a very high rate. RAM
usage was around 87% and disk usage was around 84 percent.

Here is what I have tried so far:
1- Set ulimit -n to 65535.
2- Reboot the vm to get rid of stale processes.
3- Set block_manager_max_open_files to 32000 in tserver flag file.

What I want to know now is:
1- Why am I hitting this problem? Is this due to low resources on the VM or
high number of tablets on a single tserver?
2- How can I get around this problem, recover my data and kudu services?

Would really appreciate some help on this.
-- 
Faraz Mateen

Re: "Too many open files" error

Posted by Alexey Serbin <as...@cloudera.com>.
Hi,

I think you could try to set the limit for the number of open files to
unlimited and see how it goes when you start tablet server.

I think the best way forward is to add tablet servers into the cluster.
Ideally, you want to have your data replicated, consider creating tables
with replication factor 3 and having at least 4 tablet servers in your
cluster.  Once you added a new tablet servers, don't forget to run the
rebalancer tool (kudu cluster rebalance ...)


HTH,

Alexey

On Mon, Oct 7, 2019 at 2:31 AM Faraz Mateen <fm...@an10.io> wrote:

> Alexey,
>
> Thank you for the response. Having too many partitions is exactly what the
> problem is. When I restart the tserver, it tries to open files against each
> tablet and eventually crashes.
>
> Is there a way to get around this and recover my data? Is there any config
> I can change to run the tserver? Or can I add a new tablet server and
> migrate existing tablets?
>
> On Sat, Oct 5, 2019 at 10:05 PM Alexey Serbin <as...@cloudera.com>
> wrote:
>
>> Hi,
>>
>> Most likely the issue happened because of high number of tablet replicas
>> at the tablet server.  In case of high spike of in the input data rate,
>> higher compaction activity might require more than usual number of file
>> descriptors, since more files are opened.
>>
>> How many tablet replicas does that tablet server have?  It's not
>> recommended to have too many:
>> https://kudu.apache.org/docs/known_issues.html#_scale
>>
>> To understand what has happened, you need to take a look into the logs of
>> the tablet server.  This might be useful:
>> https://kudu.apache.org/docs/troubleshooting.html
>>
>> Overall, if there is only one (?) tablet server in the whole Kudu
>> cluster, why to have 39 partitions per table?  I guess that's some sort of
>> proof-of-concept/toy setup, but anyways.  Since all the tablet replicas end
>> up at the same single tablet server, I don't see benefits from partitioning
>> in that setup.  For the tablet server, it simply means x-times increased
>> number of open file descriptors and increased memory usage.
>>
>>
>> Kind regards,
>>
>> Alexey
>>
>> On Fri, Oct 4, 2019 at 4:21 AM Faraz Mateen <fm...@an10.io> wrote:
>>
>>> Hi all,
>>>
>>> I am facing a problem with my kudu setup where tablet server crashes
>>> with "too many open files" error.
>>> The setup consists of a single master and a single tablet server. Tables
>>> created are such that there are 39 partitions per table. However not all
>>> partitions have data that corresponds to them.
>>> Yesterday my tserver crashed and when I am trying to restart the
>>> tserver, it fails with the error:
>>>
>>> I1004 03:50:39.896301  5669 ts_tablet_manager.cc:1173] T
>>> cab85f15f06748d0b59161d9f3da55f7 P ee14d248ac994d0eb60dbb0db4ab3b09:
>>> Registered tablet (data state: TABLET_DATA_READY)
>>> W1004 03:50:39.923184  5687 os-util.cc:165] could not read
>>> /proc/self/status: IO error: /proc/self/status: Too many open files (error
>>> 24)
>>> I1004 03:50:39.939460  5669 ts_tablet_manager.cc:1173] T
>>> d8d68ce6f6ea49479c00d29709869f1f P ee14d248ac994d0eb60dbb0db4ab3b09:
>>> Registered tablet (data state: TABLET_DATA_READY)
>>>
>>> I have already modified ulimit of the machine:
>>>
>>> root@vm-3:~# ulimit -a
>>> core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> scheduling priority             (-e) 0
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 63923
>>> max locked memory       (kbytes, -l) 16384
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 65535
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> real-time priority              (-r) 0
>>> stack size              (kbytes, -s) 8192
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 65535
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>>>
>>> *Set up Details:*
>>> Single master and tserver setup on a single VM.
>>> 4 cores, 550GB hard disk, 16GB RAM
>>> Kudu version 1.8 on ubuntu, installed through debian packages.
>>> Before crash, data was being inserted in kudu at a very high rate. RAM
>>> usage was around 87% and disk usage was around 84 percent.
>>>
>>> Here is what I have tried so far:
>>> 1- Set ulimit -n to 65535.
>>> 2- Reboot the vm to get rid of stale processes.
>>> 3- Set block_manager_max_open_files to 32000 in tserver flag file.
>>>
>>> What I want to know now is:
>>> 1- Why am I hitting this problem? Is this due to low resources on the VM
>>> or high number of tablets on a single tserver?
>>> 2- How can I get around this problem, recover my data and kudu services?
>>>
>>> Would really appreciate some help on this.
>>> --
>>> Faraz Mateen
>>>
>>
>
> --
> Faraz Mateen
>

Re: "Too many open files" error

Posted by Faraz Mateen <fm...@an10.io>.
Alexey,

Thank you for the response. Having too many partitions is exactly what the
problem is. When I restart the tserver, it tries to open files against each
tablet and eventually crashes.

Is there a way to get around this and recover my data? Is there any config
I can change to run the tserver? Or can I add a new tablet server and
migrate existing tablets?

On Sat, Oct 5, 2019 at 10:05 PM Alexey Serbin <as...@cloudera.com> wrote:

> Hi,
>
> Most likely the issue happened because of high number of tablet replicas
> at the tablet server.  In case of high spike of in the input data rate,
> higher compaction activity might require more than usual number of file
> descriptors, since more files are opened.
>
> How many tablet replicas does that tablet server have?  It's not
> recommended to have too many:
> https://kudu.apache.org/docs/known_issues.html#_scale
>
> To understand what has happened, you need to take a look into the logs of
> the tablet server.  This might be useful:
> https://kudu.apache.org/docs/troubleshooting.html
>
> Overall, if there is only one (?) tablet server in the whole Kudu cluster,
> why to have 39 partitions per table?  I guess that's some sort of
> proof-of-concept/toy setup, but anyways.  Since all the tablet replicas end
> up at the same single tablet server, I don't see benefits from partitioning
> in that setup.  For the tablet server, it simply means x-times increased
> number of open file descriptors and increased memory usage.
>
>
> Kind regards,
>
> Alexey
>
> On Fri, Oct 4, 2019 at 4:21 AM Faraz Mateen <fm...@an10.io> wrote:
>
>> Hi all,
>>
>> I am facing a problem with my kudu setup where tablet server crashes with
>> "too many open files" error.
>> The setup consists of a single master and a single tablet server. Tables
>> created are such that there are 39 partitions per table. However not all
>> partitions have data that corresponds to them.
>> Yesterday my tserver crashed and when I am trying to restart the tserver,
>> it fails with the error:
>>
>> I1004 03:50:39.896301  5669 ts_tablet_manager.cc:1173] T
>> cab85f15f06748d0b59161d9f3da55f7 P ee14d248ac994d0eb60dbb0db4ab3b09:
>> Registered tablet (data state: TABLET_DATA_READY)
>> W1004 03:50:39.923184  5687 os-util.cc:165] could not read
>> /proc/self/status: IO error: /proc/self/status: Too many open files (error
>> 24)
>> I1004 03:50:39.939460  5669 ts_tablet_manager.cc:1173] T
>> d8d68ce6f6ea49479c00d29709869f1f P ee14d248ac994d0eb60dbb0db4ab3b09:
>> Registered tablet (data state: TABLET_DATA_READY)
>>
>> I have already modified ulimit of the machine:
>>
>> root@vm-3:~# ulimit -a
>> core file size          (blocks, -c) 0
>> data seg size           (kbytes, -d) unlimited
>> scheduling priority             (-e) 0
>> file size               (blocks, -f) unlimited
>> pending signals                 (-i) 63923
>> max locked memory       (kbytes, -l) 16384
>> max memory size         (kbytes, -m) unlimited
>> open files                      (-n) 65535
>> pipe size            (512 bytes, -p) 8
>> POSIX message queues     (bytes, -q) 819200
>> real-time priority              (-r) 0
>> stack size              (kbytes, -s) 8192
>> cpu time               (seconds, -t) unlimited
>> max user processes              (-u) 65535
>> virtual memory          (kbytes, -v) unlimited
>> file locks                      (-x) unlimited
>>
>> *Set up Details:*
>> Single master and tserver setup on a single VM.
>> 4 cores, 550GB hard disk, 16GB RAM
>> Kudu version 1.8 on ubuntu, installed through debian packages.
>> Before crash, data was being inserted in kudu at a very high rate. RAM
>> usage was around 87% and disk usage was around 84 percent.
>>
>> Here is what I have tried so far:
>> 1- Set ulimit -n to 65535.
>> 2- Reboot the vm to get rid of stale processes.
>> 3- Set block_manager_max_open_files to 32000 in tserver flag file.
>>
>> What I want to know now is:
>> 1- Why am I hitting this problem? Is this due to low resources on the VM
>> or high number of tablets on a single tserver?
>> 2- How can I get around this problem, recover my data and kudu services?
>>
>> Would really appreciate some help on this.
>> --
>> Faraz Mateen
>>
>

-- 
Faraz Mateen

Re: "Too many open files" error

Posted by Alexey Serbin <as...@cloudera.com>.
Hi,

Most likely the issue happened because of high number of tablet replicas at
the tablet server.  In case of high spike of in the input data rate, higher
compaction activity might require more than usual number of file
descriptors, since more files are opened.

How many tablet replicas does that tablet server have?  It's not
recommended to have too many:
https://kudu.apache.org/docs/known_issues.html#_scale

To understand what has happened, you need to take a look into the logs of
the tablet server.  This might be useful:
https://kudu.apache.org/docs/troubleshooting.html

Overall, if there is only one (?) tablet server in the whole Kudu cluster,
why to have 39 partitions per table?  I guess that's some sort of
proof-of-concept/toy setup, but anyways.  Since all the tablet replicas end
up at the same single tablet server, I don't see benefits from partitioning
in that setup.  For the tablet server, it simply means x-times increased
number of open file descriptors and increased memory usage.


Kind regards,

Alexey

On Fri, Oct 4, 2019 at 4:21 AM Faraz Mateen <fm...@an10.io> wrote:

> Hi all,
>
> I am facing a problem with my kudu setup where tablet server crashes with
> "too many open files" error.
> The setup consists of a single master and a single tablet server. Tables
> created are such that there are 39 partitions per table. However not all
> partitions have data that corresponds to them.
> Yesterday my tserver crashed and when I am trying to restart the tserver,
> it fails with the error:
>
> I1004 03:50:39.896301  5669 ts_tablet_manager.cc:1173] T
> cab85f15f06748d0b59161d9f3da55f7 P ee14d248ac994d0eb60dbb0db4ab3b09:
> Registered tablet (data state: TABLET_DATA_READY)
> W1004 03:50:39.923184  5687 os-util.cc:165] could not read
> /proc/self/status: IO error: /proc/self/status: Too many open files (error
> 24)
> I1004 03:50:39.939460  5669 ts_tablet_manager.cc:1173] T
> d8d68ce6f6ea49479c00d29709869f1f P ee14d248ac994d0eb60dbb0db4ab3b09:
> Registered tablet (data state: TABLET_DATA_READY)
>
> I have already modified ulimit of the machine:
>
> root@vm-3:~# ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 63923
> max locked memory       (kbytes, -l) 16384
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 65535
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 65535
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> *Set up Details:*
> Single master and tserver setup on a single VM.
> 4 cores, 550GB hard disk, 16GB RAM
> Kudu version 1.8 on ubuntu, installed through debian packages.
> Before crash, data was being inserted in kudu at a very high rate. RAM
> usage was around 87% and disk usage was around 84 percent.
>
> Here is what I have tried so far:
> 1- Set ulimit -n to 65535.
> 2- Reboot the vm to get rid of stale processes.
> 3- Set block_manager_max_open_files to 32000 in tserver flag file.
>
> What I want to know now is:
> 1- Why am I hitting this problem? Is this due to low resources on the VM
> or high number of tablets on a single tserver?
> 2- How can I get around this problem, recover my data and kudu services?
>
> Would really appreciate some help on this.
> --
> Faraz Mateen
>