You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2015/09/30 14:07:19 UTC

Ubuntu open file limits

Greetings,

We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.

What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?

We're tried all of the obvious stuff from stackoverflow etc, like:


# vi /etc/security/limits.conf

* soft nofile 65536

* hard nofile 65536

root soft nofile 65536

root hard nofile 65536

But none of this seems to affect the RM/NM limits.

Thanks
john


RE: Ubuntu open file limits

Posted by John Lilley <jo...@redpoint.net>.
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are not configured symmetrically. 

If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread around the cluster nodes: 
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar --shell_command 'uimit -a' --num_containers 9 

yarn logs -applicationId application_1443767835805_0009 > /tmp/foo 

egrep 'Container:|open files' /tmp/foo 
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454

Only the first worker node has the higher file limit.  The rest have lower limits.

I have verified this on two separate clusters now.  The same discrepencies are observed by looking at /proc/<PID>/limits for the datanode processes on each worker node.

This is looking like an Ambari issue perhaps?  

John Lilley

-----Original Message-----
From: John Lilley 
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vv...@apache.org>
Subject: RE: Ubuntu open file limits

That's the frustrating thing.  Apparently on Ubuntu (maybe just 12.04?), services do not get their limits from /etc/security/limits.conf.  We put these entries in long ago but they have no affect:

* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536

John Lilley


-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <jo...@redpoint.net>
Subject: Re: Ubuntu open file limits

Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after changing the limits in the limits.conf?

-Varun



On 10/1/15, 9:32 PM, "John Lilley" <jo...@redpoint.net> wrote:

>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit                     Soft Limit           Hard Limit           Units
>Max cpu time              unlimited            unlimited            seconds
>Max file size             unlimited            unlimited            bytes
>Max data size             unlimited            unlimited            bytes
>Max stack size            8388608              unlimited            bytes
>Max core file size        0                    unlimited            bytes
>Max resident set          unlimited            unlimited            bytes
>Max processes             95970                95970                processes
>Max open files            1024                 4096                 files
>Max locked memory         65536                65536                bytes
>Max address space         unlimited            unlimited            bytes
>Max file locks            unlimited            unlimited            locks
>Max pending signals       95970                95970                signals
>Max msgqueue size         819200               819200               bytes
>Max nice priority         0                    0
>Max realtime priority     0                    0
>Max realtime timeout      unlimited            unlimited            us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <jo...@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <jo...@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root  1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1 
>>root root  1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root  4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct  1 09:54 data
>>-rwx------ 1 root root  3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root  4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <jo...@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <jo...@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>


RE: Ubuntu open file limits

Posted by John Lilley <jo...@redpoint.net>.
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are not configured symmetrically. 

If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread around the cluster nodes: 
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar --shell_command 'uimit -a' --num_containers 9 

yarn logs -applicationId application_1443767835805_0009 > /tmp/foo 

egrep 'Container:|open files' /tmp/foo 
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454

Only the first worker node has the higher file limit.  The rest have lower limits.

I have verified this on two separate clusters now.  The same discrepencies are observed by looking at /proc/<PID>/limits for the datanode processes on each worker node.

This is looking like an Ambari issue perhaps?  

John Lilley

-----Original Message-----
From: John Lilley 
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vv...@apache.org>
Subject: RE: Ubuntu open file limits

That's the frustrating thing.  Apparently on Ubuntu (maybe just 12.04?), services do not get their limits from /etc/security/limits.conf.  We put these entries in long ago but they have no affect:

* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536

John Lilley


-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <jo...@redpoint.net>
Subject: Re: Ubuntu open file limits

Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after changing the limits in the limits.conf?

-Varun



On 10/1/15, 9:32 PM, "John Lilley" <jo...@redpoint.net> wrote:

>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit                     Soft Limit           Hard Limit           Units
>Max cpu time              unlimited            unlimited            seconds
>Max file size             unlimited            unlimited            bytes
>Max data size             unlimited            unlimited            bytes
>Max stack size            8388608              unlimited            bytes
>Max core file size        0                    unlimited            bytes
>Max resident set          unlimited            unlimited            bytes
>Max processes             95970                95970                processes
>Max open files            1024                 4096                 files
>Max locked memory         65536                65536                bytes
>Max address space         unlimited            unlimited            bytes
>Max file locks            unlimited            unlimited            locks
>Max pending signals       95970                95970                signals
>Max msgqueue size         819200               819200               bytes
>Max nice priority         0                    0
>Max realtime priority     0                    0
>Max realtime timeout      unlimited            unlimited            us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <jo...@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <jo...@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root  1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1 
>>root root  1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root  4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct  1 09:54 data
>>-rwx------ 1 root root  3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root  4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <jo...@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <jo...@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>


RE: Ubuntu open file limits

Posted by John Lilley <jo...@redpoint.net>.
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are not configured symmetrically. 

If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread around the cluster nodes: 
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar --shell_command 'uimit -a' --num_containers 9 

yarn logs -applicationId application_1443767835805_0009 > /tmp/foo 

egrep 'Container:|open files' /tmp/foo 
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454

Only the first worker node has the higher file limit.  The rest have lower limits.

I have verified this on two separate clusters now.  The same discrepencies are observed by looking at /proc/<PID>/limits for the datanode processes on each worker node.

This is looking like an Ambari issue perhaps?  

John Lilley

-----Original Message-----
From: John Lilley 
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vv...@apache.org>
Subject: RE: Ubuntu open file limits

That's the frustrating thing.  Apparently on Ubuntu (maybe just 12.04?), services do not get their limits from /etc/security/limits.conf.  We put these entries in long ago but they have no affect:

* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536

John Lilley


-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <jo...@redpoint.net>
Subject: Re: Ubuntu open file limits

Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after changing the limits in the limits.conf?

-Varun



On 10/1/15, 9:32 PM, "John Lilley" <jo...@redpoint.net> wrote:

>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit                     Soft Limit           Hard Limit           Units
>Max cpu time              unlimited            unlimited            seconds
>Max file size             unlimited            unlimited            bytes
>Max data size             unlimited            unlimited            bytes
>Max stack size            8388608              unlimited            bytes
>Max core file size        0                    unlimited            bytes
>Max resident set          unlimited            unlimited            bytes
>Max processes             95970                95970                processes
>Max open files            1024                 4096                 files
>Max locked memory         65536                65536                bytes
>Max address space         unlimited            unlimited            bytes
>Max file locks            unlimited            unlimited            locks
>Max pending signals       95970                95970                signals
>Max msgqueue size         819200               819200               bytes
>Max nice priority         0                    0
>Max realtime priority     0                    0
>Max realtime timeout      unlimited            unlimited            us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <jo...@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <jo...@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root  1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1 
>>root root  1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root  4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct  1 09:54 data
>>-rwx------ 1 root root  3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root  4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <jo...@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <jo...@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>


RE: Ubuntu open file limits

Posted by John Lilley <jo...@redpoint.net>.
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are not configured symmetrically. 

If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread around the cluster nodes: 
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar --shell_command 'uimit -a' --num_containers 9 

yarn logs -applicationId application_1443767835805_0009 > /tmp/foo 

egrep 'Container:|open files' /tmp/foo 
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454 
open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454 
open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454

Only the first worker node has the higher file limit.  The rest have lower limits.

I have verified this on two separate clusters now.  The same discrepencies are observed by looking at /proc/<PID>/limits for the datanode processes on each worker node.

This is looking like an Ambari issue perhaps?  

John Lilley

-----Original Message-----
From: John Lilley 
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vv...@apache.org>
Subject: RE: Ubuntu open file limits

That's the frustrating thing.  Apparently on Ubuntu (maybe just 12.04?), services do not get their limits from /etc/security/limits.conf.  We put these entries in long ago but they have no affect:

* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536

John Lilley


-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <jo...@redpoint.net>
Subject: Re: Ubuntu open file limits

Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after changing the limits in the limits.conf?

-Varun



On 10/1/15, 9:32 PM, "John Lilley" <jo...@redpoint.net> wrote:

>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit                     Soft Limit           Hard Limit           Units
>Max cpu time              unlimited            unlimited            seconds
>Max file size             unlimited            unlimited            bytes
>Max data size             unlimited            unlimited            bytes
>Max stack size            8388608              unlimited            bytes
>Max core file size        0                    unlimited            bytes
>Max resident set          unlimited            unlimited            bytes
>Max processes             95970                95970                processes
>Max open files            1024                 4096                 files
>Max locked memory         65536                65536                bytes
>Max address space         unlimited            unlimited            bytes
>Max file locks            unlimited            unlimited            locks
>Max pending signals       95970                95970                signals
>Max msgqueue size         819200               819200               bytes
>Max nice priority         0                    0
>Max realtime priority     0                    0
>Max realtime timeout      unlimited            unlimited            us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <jo...@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <jo...@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root  1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1 
>>root root  1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root  4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct  1 09:54 data
>>-rwx------ 1 root root  3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root  4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <jo...@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <jo...@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096 before the ResourceManager and NodeManagers start, and our tasks also get that limit. But this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536 or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>