You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2015/09/30 14:07:19 UTC
Ubuntu open file limits
Greetings,
We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
We're tried all of the obvious stuff from stackoverflow etc, like:
# vi /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
root soft nofile 65536
root hard nofile 65536
But none of this seems to affect the RM/NM limits.
Thanks
john
RE: Ubuntu open file limits
Posted by John Lilley <jo...@redpoint.net>.
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are not configured symmetrically.
If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread around the cluster nodes:
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar --shell_command 'uimit -a' --num_containers 9
yarn logs -applicationId application_1443767835805_0009 > /tmp/foo
egrep 'Container:|open files' /tmp/foo
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454
Only the first worker node has the higher file limit. The rest have lower limits.
I have verified this on two separate clusters now. The same discrepencies are observed by looking at /proc/<PID>/limits for the datanode processes on each worker node.
This is looking like an Ambari issue perhaps?
John Lilley
-----Original Message-----
From: John Lilley
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vv...@apache.org>
Subject: RE: Ubuntu open file limits
That's the frustrating thing. Apparently on Ubuntu (maybe just 12.04?), services do not get their limits from /etc/security/limits.conf. We put these entries in long ago but they have no affect:
* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536
John Lilley
-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <jo...@redpoint.net>
Subject: Re: Ubuntu open file limits
Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after changing the limits in the limits.conf?
-Varun
On 10/1/15, 9:32 PM, "John Lilley" <jo...@redpoint.net> wrote:
>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit Soft Limit Hard Limit Units
>Max cpu time unlimited unlimited seconds
>Max file size unlimited unlimited bytes
>Max data size unlimited unlimited bytes
>Max stack size 8388608 unlimited bytes
>Max core file size 0 unlimited bytes
>Max resident set unlimited unlimited bytes
>Max processes 95970 95970 processes
>Max open files 1024 4096 files
>Max locked memory 65536 65536 bytes
>Max address space unlimited unlimited bytes
>Max file locks unlimited unlimited locks
>Max pending signals 95970 95970 signals
>Max msgqueue size 819200 819200 bytes
>Max nice priority 0 0
>Max realtime priority 0 0
>Max realtime timeout unlimited unlimited us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <jo...@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <jo...@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root 1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1
>>root root 1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root 4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct 1 09:54 data
>>-rwx------ 1 root root 3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root 4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <jo...@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <jo...@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>
RE: Ubuntu open file limits
Posted by John Lilley <jo...@redpoint.net>.
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are not configured symmetrically.
If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread around the cluster nodes:
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar --shell_command 'uimit -a' --num_containers 9
yarn logs -applicationId application_1443767835805_0009 > /tmp/foo
egrep 'Container:|open files' /tmp/foo
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454
Only the first worker node has the higher file limit. The rest have lower limits.
I have verified this on two separate clusters now. The same discrepencies are observed by looking at /proc/<PID>/limits for the datanode processes on each worker node.
This is looking like an Ambari issue perhaps?
John Lilley
-----Original Message-----
From: John Lilley
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vv...@apache.org>
Subject: RE: Ubuntu open file limits
That's the frustrating thing. Apparently on Ubuntu (maybe just 12.04?), services do not get their limits from /etc/security/limits.conf. We put these entries in long ago but they have no affect:
* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536
John Lilley
-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <jo...@redpoint.net>
Subject: Re: Ubuntu open file limits
Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after changing the limits in the limits.conf?
-Varun
On 10/1/15, 9:32 PM, "John Lilley" <jo...@redpoint.net> wrote:
>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit Soft Limit Hard Limit Units
>Max cpu time unlimited unlimited seconds
>Max file size unlimited unlimited bytes
>Max data size unlimited unlimited bytes
>Max stack size 8388608 unlimited bytes
>Max core file size 0 unlimited bytes
>Max resident set unlimited unlimited bytes
>Max processes 95970 95970 processes
>Max open files 1024 4096 files
>Max locked memory 65536 65536 bytes
>Max address space unlimited unlimited bytes
>Max file locks unlimited unlimited locks
>Max pending signals 95970 95970 signals
>Max msgqueue size 819200 819200 bytes
>Max nice priority 0 0
>Max realtime priority 0 0
>Max realtime timeout unlimited unlimited us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <jo...@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <jo...@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root 1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1
>>root root 1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root 4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct 1 09:54 data
>>-rwx------ 1 root root 3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root 4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <jo...@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <jo...@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>
RE: Ubuntu open file limits
Posted by John Lilley <jo...@redpoint.net>.
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are not configured symmetrically.
If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread around the cluster nodes:
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar --shell_command 'uimit -a' --num_containers 9
yarn logs -applicationId application_1443767835805_0009 > /tmp/foo
egrep 'Container:|open files' /tmp/foo
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454
Only the first worker node has the higher file limit. The rest have lower limits.
I have verified this on two separate clusters now. The same discrepencies are observed by looking at /proc/<PID>/limits for the datanode processes on each worker node.
This is looking like an Ambari issue perhaps?
John Lilley
-----Original Message-----
From: John Lilley
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vv...@apache.org>
Subject: RE: Ubuntu open file limits
That's the frustrating thing. Apparently on Ubuntu (maybe just 12.04?), services do not get their limits from /etc/security/limits.conf. We put these entries in long ago but they have no affect:
* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536
John Lilley
-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <jo...@redpoint.net>
Subject: Re: Ubuntu open file limits
Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after changing the limits in the limits.conf?
-Varun
On 10/1/15, 9:32 PM, "John Lilley" <jo...@redpoint.net> wrote:
>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit Soft Limit Hard Limit Units
>Max cpu time unlimited unlimited seconds
>Max file size unlimited unlimited bytes
>Max data size unlimited unlimited bytes
>Max stack size 8388608 unlimited bytes
>Max core file size 0 unlimited bytes
>Max resident set unlimited unlimited bytes
>Max processes 95970 95970 processes
>Max open files 1024 4096 files
>Max locked memory 65536 65536 bytes
>Max address space unlimited unlimited bytes
>Max file locks unlimited unlimited locks
>Max pending signals 95970 95970 signals
>Max msgqueue size 819200 819200 bytes
>Max nice priority 0 0
>Max realtime priority 0 0
>Max realtime timeout unlimited unlimited us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <jo...@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <jo...@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root 1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1
>>root root 1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root 4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct 1 09:54 data
>>-rwx------ 1 root root 3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root 4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <jo...@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <jo...@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>
RE: Ubuntu open file limits
Posted by John Lilley <jo...@redpoint.net>.
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are not configured symmetrically.
If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread around the cluster nodes:
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar --shell_command 'uimit -a' --num_containers 9
yarn logs -applicationId application_1443767835805_0009 > /tmp/foo
egrep 'Container:|open files' /tmp/foo
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454
open files (-n) 32768
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454
open files (-n) 4096
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454
Only the first worker node has the higher file limit. The rest have lower limits.
I have verified this on two separate clusters now. The same discrepencies are observed by looking at /proc/<PID>/limits for the datanode processes on each worker node.
This is looking like an Ambari issue perhaps?
John Lilley
-----Original Message-----
From: John Lilley
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vv...@apache.org>
Subject: RE: Ubuntu open file limits
That's the frustrating thing. Apparently on Ubuntu (maybe just 12.04?), services do not get their limits from /etc/security/limits.conf. We put these entries in long ago but they have no affect:
* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536
John Lilley
-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <jo...@redpoint.net>
Subject: Re: Ubuntu open file limits
Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after changing the limits in the limits.conf?
-Varun
On 10/1/15, 9:32 PM, "John Lilley" <jo...@redpoint.net> wrote:
>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit Soft Limit Hard Limit Units
>Max cpu time unlimited unlimited seconds
>Max file size unlimited unlimited bytes
>Max data size unlimited unlimited bytes
>Max stack size 8388608 unlimited bytes
>Max core file size 0 unlimited bytes
>Max resident set unlimited unlimited bytes
>Max processes 95970 95970 processes
>Max open files 1024 4096 files
>Max locked memory 65536 65536 bytes
>Max address space unlimited unlimited bytes
>Max file locks unlimited unlimited locks
>Max pending signals 95970 95970 signals
>Max msgqueue size 819200 819200 bytes
>Max nice priority 0 0
>Max realtime priority 0 0
>Max realtime timeout unlimited unlimited us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <jo...@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <jo...@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root 1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1
>>root root 1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root 4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct 1 09:54 data
>>-rwx------ 1 root root 3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root 4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <jo...@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <jo...@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>