You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ambari.apache.org by Andrew Onischuk <ao...@hortonworks.com> on 2016/01/18 13:15:53 UTC

Review Request 42456: LDAP Requests Via nslcd Take Too Long In Some Organizations

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42456/
-----------------------------------------------------------

Review request for Ambari and Dmitro Lisnichenko.


Bugs: AMBARI-14708
    https://issues.apache.org/jira/browse/AMBARI-14708


Repository: ambari


Description
-------

When performing a restart of a large cluster where LDAP is being used
indirectly by nslcd, the LDAP servers are put under heavy load. This is more
evident in LDAP organizations that are large to begin with.

connection from pid=12345 uid=0 gid=0  
nslcd_group_all()  
myldap_search(base="cn=groups,cn=accounts,dc=corp,dc=local",
filter="(objectClass=posixGroup)")  
ldap_result(): end of results

    
    
    
    
    It turns out that these processes are the before-ANY hook script which runs when a service is started, like this one I was running locally to reproduce the query patterns.
    
    

/usr/bin/python2.6 /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-
ANY/scripts/hook.py ANY /var/lib/ambari-agent/data/command-5950.json /var/lib
/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY /var/lib/ambari-
agent/data/structured-out-5950.json INFO /var/lib/ambari-agent/data/tmp

    
    
    
    
    I tracked the issue down to this function in {{resource_management/core/providers/accounts.py}}:
    
    

@property  
def user_groups(self):  
return [g.gr_name for g in grp.getgrall() if self.resource.username in g.gr_me

    
    
    
    
    This property actually gets referenced at least 2 times for each user.  The call to {{grp.getgrall()}} forces a complete enumeration of groups every time.
    
    What this means is for a cluster with many nodes with many processes restarting across those nodes you are going to have many of these full enumeration searches running at the same time.  In an enterprise with a large directory this will get very expensive, especially since this type of call is not cached by nscd.
    
    I'm aware that the idiom used here to get the groups is common in python but it's actually pretty inefficient.  Commands like id and groups have more efficient ways of discovering this.  I'm not aware of the equivalent of these in Python.
    
    

@property  
def user_groups(self):  
ret = []  
(rc, output) = shell.checked_call(['groups', self.resource.username](https://h
sudo=True)  
if rc == 0:  
ret.extend(output.split(':')[1](
).lstrip().split())  
return ret

This converts the full LDAP scan for groups to more efficient queries targeted
to the user. The lookups done by the groups command are also 100% cacheable.
Since it's a checked call the `rc == 0` check is probably not needed.

An unfortunate effect of how usermod and friends work is that it always
invalidates the nscd cache after it's run. This means that Ambari could still
be a lot more efficient than it is when LDAP is in play by being pickier about
when it runs commands like useradd/usermod/groupadd/groupmod.

We can also probably put a timed cache on the results from `grp.getgrall()` or
`groups` in memory, configurable by the agent config file. This way, we would
only call it once every hour or so.


Diffs
-----

  ambari-common/src/main/python/resource_management/core/providers/accounts.py e7ef399 
  ambari-common/src/main/python/resource_management/core/resources/accounts.py a75b723 
  ambari-common/src/main/python/resource_management/core/system.py 228474b 
  ambari-common/src/main/python/resource_management/core/utils.py 247f068 
  ambari-server/src/main/resources/stacks/HDP/2.0.6/configuration/cluster-env.xml 805aa29 
  ambari-server/src/main/resources/stacks/HDP/2.0.6/hooks/before-ANY/scripts/params.py 034415a 
  ambari-server/src/main/resources/stacks/HDP/2.0.6/hooks/before-ANY/scripts/shared_initialization.py 10777f9 
  ambari-server/src/test/python/stacks/2.0.6/configs/default.json bc40657 
  ambari-server/src/test/python/stacks/2.0.6/hooks/before-ANY/test_before_any.py 3da58cd 

Diff: https://reviews.apache.org/r/42456/diff/


Testing
-------

mvn clean test


Thanks,

Andrew Onischuk

Re: Review Request 42456: LDAP Requests Via nslcd Take Too Long In Some Organizations

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42456/#review114998
-----------------------------------------------------------

Ship it!


Ship It!

- Dmitro Lisnichenko


On Jan. 18, 2016, 2:15 p.m., Andrew Onischuk wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42456/
> -----------------------------------------------------------
> 
> (Updated Jan. 18, 2016, 2:15 p.m.)
> 
> 
> Review request for Ambari and Dmitro Lisnichenko.
> 
> 
> Bugs: AMBARI-14708
>     https://issues.apache.org/jira/browse/AMBARI-14708
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When performing a restart of a large cluster where LDAP is being used
> indirectly by nslcd, the LDAP servers are put under heavy load. This is more
> evident in LDAP organizations that are large to begin with.
> 
> connection from pid=12345 uid=0 gid=0  
> nslcd_group_all()  
> myldap_search(base="cn=groups,cn=accounts,dc=corp,dc=local",
> filter="(objectClass=posixGroup)")  
> ldap_result(): end of results
> 
>     
>     
>     
>     
>     It turns out that these processes are the before-ANY hook script which runs when a service is started, like this one I was running locally to reproduce the query patterns.
>     
>     
> 
> /usr/bin/python2.6 /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-
> ANY/scripts/hook.py ANY /var/lib/ambari-agent/data/command-5950.json /var/lib
> /ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY /var/lib/ambari-
> agent/data/structured-out-5950.json INFO /var/lib/ambari-agent/data/tmp
> 
>     
>     
>     
>     
>     I tracked the issue down to this function in {{resource_management/core/providers/accounts.py}}:
>     
>     
> 
> @property  
> def user_groups(self):  
> return [g.gr_name for g in grp.getgrall() if self.resource.username in g.gr_me
> 
>     
>     
>     
>     
>     This property actually gets referenced at least 2 times for each user.  The call to {{grp.getgrall()}} forces a complete enumeration of groups every time.
>     
>     What this means is for a cluster with many nodes with many processes restarting across those nodes you are going to have many of these full enumeration searches running at the same time.  In an enterprise with a large directory this will get very expensive, especially since this type of call is not cached by nscd.
>     
>     I'm aware that the idiom used here to get the groups is common in python but it's actually pretty inefficient.  Commands like id and groups have more efficient ways of discovering this.  I'm not aware of the equivalent of these in Python.
>     
>     
> 
> @property  
> def user_groups(self):  
> ret = []  
> (rc, output) = shell.checked_call(['groups', self.resource.username](https://h
> sudo=True)  
> if rc == 0:  
> ret.extend(output.split(':')[1](
> ).lstrip().split())  
> return ret
> 
> This converts the full LDAP scan for groups to more efficient queries targeted
> to the user. The lookups done by the groups command are also 100% cacheable.
> Since it's a checked call the `rc == 0` check is probably not needed.
> 
> An unfortunate effect of how usermod and friends work is that it always
> invalidates the nscd cache after it's run. This means that Ambari could still
> be a lot more efficient than it is when LDAP is in play by being pickier about
> when it runs commands like useradd/usermod/groupadd/groupmod.
> 
> We can also probably put a timed cache on the results from `grp.getgrall()` or
> `groups` in memory, configurable by the agent config file. This way, we would
> only call it once every hour or so.
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/core/providers/accounts.py e7ef399 
>   ambari-common/src/main/python/resource_management/core/resources/accounts.py a75b723 
>   ambari-common/src/main/python/resource_management/core/system.py 228474b 
>   ambari-common/src/main/python/resource_management/core/utils.py 247f068 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/configuration/cluster-env.xml 805aa29 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/hooks/before-ANY/scripts/params.py 034415a 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/hooks/before-ANY/scripts/shared_initialization.py 10777f9 
>   ambari-server/src/test/python/stacks/2.0.6/configs/default.json bc40657 
>   ambari-server/src/test/python/stacks/2.0.6/hooks/before-ANY/test_before_any.py 3da58cd 
> 
> Diff: https://reviews.apache.org/r/42456/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>