You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Dmitro Lisnichenko <dl...@hortonworks.com> on 2015/10/02 12:43:25 UTC

Review Request 38951: ACCUMULO_TRACER START failed after enabling Kerberos

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38951/
-----------------------------------------------------------

Review request for Ambari and Andrew Onischuk.


Bugs: AMBARI-13295
    https://issues.apache.org/jira/browse/AMBARI-13295


Repository: ambari


Description
-------

After enabling Kerberos on the "Start and Test Services" step ACCUMULO_TRACER START failed.

{code}
"stderr" : "Python script has been killed due to timeout after waiting 180 secs",
{code}

{code}
"stdout" : "2015-09-25 14:42:53,963 - Group['custom-spark'] {}\n2015-09-25 14:42:53,964 - Group['hadoop'] {}\n2015-09-25 14:42:53,965 - Group['custom-users'] {}\n2015-09-25 14:42:53,965 - Group['custom-knox-group'] {}\n2015-09-25 14:42:53,965 - User['custom-sqoop'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,966 - User['custom-knox'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,967 - User['custom-hdfs'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,968 - User['custom-oozie'] {'gid': 'hadoop', 'groups': [u'custom-users']}\n2015-09-25 14:42:53,969 - User['custom-smoke'] {'gid': 'hadoop', 'groups': [u'custom-users']}\n2015-09-25 14:42:53,970 - User['custom-hbase'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,971 - User['custom-tez'] {'gid': 'hadoop', 'groups': [u'custom-users']}\n2015-09-25 14:42:53,972 - User['custom-hive'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,973 - User['custom-mr'] {'gid': 'h
 adoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,973 - User['custom-accumulo'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,974 - User['custom-hcat'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,975 - User['custom-ams'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,976 - User['custom-yarn'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,977 - User['custom-falcon'] {'gid': 'hadoop', 'groups': [u'custom-users']}\n2015-09-25 14:42:53,977 - User['custom-spark'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,978 - User['custom-atlas'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,979 - User['custom-flume'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,980 - User['custom-kafka'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,981 - User['custom-zookeeper'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,982 - User['custom-mahout'] {'gid': 'hadoop',
  'groups': [u'hadoop']}\n2015-09-25 14:42:53,982 - User['custom-storm'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,983 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}\n2015-09-25 14:42:53,985 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh custom-smoke /tmp/hadoop-custom-smoke,/tmp/hsperfdata_custom-smoke,/home/custom-smoke,/tmp/custom-smoke,/tmp/sqoop-custom-smoke'] {'not_if': '(test $(id -u custom-smoke) -gt 1000) || (false)'}\n2015-09-25 14:42:53,991 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh custom-smoke /tmp/hadoop-custom-smoke,/tmp/hsperfdata_custom-smoke,/home/custom-smoke,/tmp/custom-smoke,/tmp/sqoop-custom-smoke'] due to not_if\n2015-09-25 14:42:53,991 - Directory['/tmp/hbase-hbase'] {'owner': 'custom-hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}\n2015-09-25 14:42:53,992 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh
 '), 'mode': 0555}\n2015-09-25 14:42:53,993 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh custom-hbase /home/custom-hbase,/tmp/custom-hbase,/usr/bin/custom-hbase,/var/log/custom-hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u custom-hbase) -gt 1000) || (false)'}\n2015-09-25 14:42:53,999 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh custom-hbase /home/custom-hbase,/tmp/custom-hbase,/usr/bin/custom-hbase,/var/log/custom-hbase,/tmp/hbase-hbase'] due to not_if\n2015-09-25 14:42:54,000 - Group['custom-hdfs'] {'ignore_failures': False}\n2015-09-25 14:42:54,000 - User['custom-hdfs'] {'ignore_failures': False, 'groups': [u'hadoop', u'custom-hdfs']}\n2015-09-25 14:42:54,001 - Directory['/etc/hadoop'] {'mode': 0755}\n2015-09-25 14:42:54,019 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'root', 'group': 'hadoop'}\n2015-09-25 14:42:54,019 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'custom-hd
 fs', 'group': 'hadoop', 'mode': 0777}\n2015-09-25 14:42:54,032 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}\n2015-09-25 14:42:54,039 - Skipping Execute[('setenforce', '0')] due to not_if\n2015-09-25 14:42:54,040 - Directory['/grid/0/log/hadoop'] {'owner': 'root', 'mode': 0775, 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}\n2015-09-25 14:42:54,043 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True, 'cd_access': 'a'}\n2015-09-25 14:42:54,043 - Directory['/tmp/hadoop-custom-hdfs'] {'owner': 'custom-hdfs', 'recursive': True, 'cd_access': 'a'}\n2015-09-25 14:42:54,048 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'root'}\n2015-09-25 14:42:54,051 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('hea
 lth_check.j2'), 'owner': 'root'}\n2015-09-25 14:42:54,051 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'custom-hdfs', 'group': 'hadoop', 'mode': 0644}\n2015-09-25 14:42:54,074 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'custom-hdfs'}\n2015-09-25 14:42:54,075 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}\n2015-09-25 14:42:54,076 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'custom-hdfs', 'group': 'hadoop'}\n2015-09-25 14:42:54,083 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'custom-hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}\n2015-09-25 14:42:54,089 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 
 'test -d /etc/hadoop/conf', 'mode': 0755}\n2015-09-25 14:42:54,275 - Directory['/usr/hdp/current/accumulo-tracer/conf'] {'owner': 'custom-accumulo', 'group': 'hadoop', 'recursive': True, 'mode': 0755}\n2015-09-25 14:42:54,277 - Directory['/usr/hdp/current/accumulo-tracer/conf/server'] {'owner': 'custom-accumulo', 'group': 'hadoop', 'recursive': True, 'mode': 0700}\n2015-09-25 14:42:54,278 - XmlConfig['accumulo-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/accumulo-tracer/conf/server', 'mode': 0600, 'configuration_attributes': {}, 'owner': 'custom-accumulo', 'configurations': ...}\n2015-09-25 14:42:54,292 - Generating config: /usr/hdp/current/accumulo-tracer/conf/server/accumulo-site.xml\n2015-09-25 14:42:54,293 - File['/usr/hdp/current/accumulo-tracer/conf/server/accumulo-site.xml'] {'owner': 'custom-accumulo', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0600, 'encoding': 'UTF-8'}\n2015-09-25 14:42:54,317 - Directory['/var/run/accumulo'] {'owner': 'cust
 om-accumulo', 'group': 'hadoop', 'recursive': True}\n2015-09-25 14:42:54,318 - Directory['/grid/0/log/accumulo'] {'owner': 'custom-accumulo', 'group': 'hadoop', 'recursive': True}\n2015-09-25 14:42:54,323 - File['/usr/hdp/current/accumulo-tracer/conf/server/accumulo-env.sh'] {'content': InlineTemplate(...), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': 0644}\n2015-09-25 14:42:54,324 - PropertiesFile['/usr/hdp/current/accumulo-tracer/conf/server/client.conf'] {'owner': 'custom-accumulo', 'group': 'hadoop', 'properties': {'instance.zookeeper.host': u'ambari-ooziehive-r1-2.novalocal:2181,ambari-ooziehive-r1-3.novalocal:2181,ambari-ooziehive-r1-5.novalocal:2181', 'instance.name': u'hdp-accumulo-instance', 'instance.rpc.sasl.enabled': True, 'instance.zookeeper.timeout': u'30s'}}\n2015-09-25 14:42:54,329 - Generating properties file: /usr/hdp/current/accumulo-tracer/conf/server/client.conf\n2015-09-25 14:42:54,329 - File['/usr/hdp/current/accumulo-tracer/conf/server/client.conf']
  {'owner': 'custom-accumulo', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,332 - Writing File['/usr/hdp/current/accumulo-tracer/conf/server/client.conf'] because contents don't match\n2015-09-25 14:42:54,333 - File['/usr/hdp/current/accumulo-tracer/conf/server/log4j.properties'] {'content': ..., 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': 0644}\n2015-09-25 14:42:54,333 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/auditLog.xml'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,337 - File['/usr/hdp/current/accumulo-tracer/conf/server/auditLog.xml'] {'content': Template('auditLog.xml.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,337 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/generic_logger.xml'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,341 - File['/usr/hdp/current/a
 ccumulo-tracer/conf/server/generic_logger.xml'] {'content': Template('generic_logger.xml.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,342 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/monitor_logger.xml'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,344 - File['/usr/hdp/current/accumulo-tracer/conf/server/monitor_logger.xml'] {'content': Template('monitor_logger.xml.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,345 - File['/usr/hdp/current/accumulo-tracer/conf/server/accumulo-metrics.xml'] {'content': StaticFile('accumulo-metrics.xml'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': 0644}\n2015-09-25 14:42:54,346 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/tracers'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,348 - File['/usr/hdp/current/accumulo-tracer/conf/server
 /tracers'] {'content': Template('tracers.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,349 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/gc'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,351 - File['/usr/hdp/current/accumulo-tracer/conf/server/gc'] {'content': Template('gc.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,352 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/monitor'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,354 - File['/usr/hdp/current/accumulo-tracer/conf/server/monitor'] {'content': Template('monitor.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,355 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/slaves'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,357 - File
 ['/usr/hdp/current/accumulo-tracer/conf/server/slaves'] {'content': Template('slaves.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,357 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/masters'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,359 - File['/usr/hdp/current/accumulo-tracer/conf/server/masters'] {'content': Template('masters.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,360 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/hadoop-metrics2-accumulo.properties'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,368 - File['/usr/hdp/current/accumulo-tracer/conf/server/hadoop-metrics2-accumulo.properties'] {'content': Template('hadoop-metrics2-accumulo.properties.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,369 - Execute['/usr/bin/kinit -kt 
 /etc/security/keytabs/accumulo.headless.keytab custom-accumulo@EXAMPLE.COM; ACCUMULO_CONF_DIR=/usr/hdp/current/accumulo-tracer/conf/server /usr/hdp/current/accumulo-client/bin/accumulo init --reset-security --user custom-accumulo@EXAMPLE.COM --password NA >/grid/0/log/accumulo/accumulo-reset.out 2>/grid/0/log/accumulo/accumulo-reset.err'] {'not_if': 'ambari-sudo.sh su custom-accumulo -l -s /bin/bash -c \\'/usr/bin/kinit -kt /etc/security/keytabs/accumulo.headless.keytab custom-accumulo@EXAMPLE.COM; ACCUMULO_CONF_DIR=/usr/hdp/current/accumulo-tracer/conf/server /usr/hdp/current/accumulo-client/bin/accumulo shell -e \"userpermissions -u custom-accumulo@EXAMPLE.COM\" | grep System.CREATE_TABLE\\'', 'user': 'custom-accumulo'}",
{code}

tserver log contains the following exceptions
{code}
2015-09-25 14:29:38,821 [tserver.TabletServer] INFO : Started replication service on ambari-ooziehive-r1-2.novalocal:10002
2015-09-25 14:29:55,489 [server.TThreadPoolServer] ERROR: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:360)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
	... 11 more
2015-09-25 14:30:01,812 [tserver.TabletServer] INFO : Loading tablet !0<;~
2015-09-25 14:30:01,894 [tserver.TabletServer] INFO : ambari-ooziehive-r1-2.novalocal:9997: got assignment from master: !0<;~
2015-09-25 14:30:02,833 [util.MetadataTableUtil] INFO : Scanning logging entries for !0<;~
2015-09-25 14:30:02,862 [util.MetadataTableUtil] INFO : Scanning metadata for logs used for tablet !0<;~
2015-09-25 14:30:02,924 [util.MetadataTableUtil] INFO : Returning logs [] for extent !0<;~
2015-09-25 14:30:34,637 [server.TThreadPoolServer] ERROR: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:360)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
	... 11 more
{code}

Live (another 48 hours) cluster which happened fail:
172.22.90.201	ambari-ooziehive-r1-5.novalocal	ambari-ooziehive-r1-5
172.22.90.200	ambari-ooziehive-r1-2.novalocal	ambari-ooziehive-r1-2
172.22.90.198	ambari-ooziehive-r1-3.novalocal	ambari-ooziehive-r1-3
172.22.90.197	ambari-ooziehive-r1-4.novalocal	ambari-ooziehive-r1-4
172.22.90.199	ambari-ooziehive-r1-1.novalocal	ambari-ooziehive-r1-1


Diffs
-----

  ambari-common/src/main/python/resource_management/libraries/functions/__init__.py 1998f69 
  ambari-common/src/main/python/resource_management/libraries/functions/get_bare_principal.py PRE-CREATION 
  ambari-server/src/main/resources/common-services/ACCUMULO/1.6.1.2.2.0/package/scripts/params.py ca8cebe 
  ambari-server/src/main/resources/common-services/STORM/0.9.1.2.1/package/scripts/params_linux.py 2349a92 
  ambari-server/src/main/resources/stacks/HDP/2.3/services/ACCUMULO/kerberos.json 73aaf3d 

Diff: https://reviews.apache.org/r/38951/diff/


Testing
-------

mvn clean test


Thanks,

Dmitro Lisnichenko


Re: Review Request 38951: ACCUMULO_TRACER START failed after enabling Kerberos

Posted by Andrew Onischuk <ao...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38951/#review101321
-----------------------------------------------------------

Ship it!


Ship It!

- Andrew Onischuk


On Oct. 2, 2015, 10:43 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38951/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2015, 10:43 a.m.)
> 
> 
> Review request for Ambari and Andrew Onischuk.
> 
> 
> Bugs: AMBARI-13295
>     https://issues.apache.org/jira/browse/AMBARI-13295
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> After enabling Kerberos on the "Start and Test Services" step ACCUMULO_TRACER START failed.
> 
> {code}
> "stderr" : "Python script has been killed due to timeout after waiting 180 secs",
> {code}
> 
> {code}
> "stdout" : "2015-09-25 14:42:53,963 - Group['custom-spark'] {}\n2015-09-25 14:42:53,964 - Group['hadoop'] {}\n2015-09-25 14:42:53,965 - Group['custom-users'] {}\n2015-09-25 14:42:53,965 - Group['custom-knox-group'] {}\n2015-09-25 14:42:53,965 - User['custom-sqoop'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,966 - User['custom-knox'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,967 - User['custom-hdfs'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,968 - User['custom-oozie'] {'gid': 'hadoop', 'groups': [u'custom-users']}\n2015-09-25 14:42:53,969 - User['custom-smoke'] {'gid': 'hadoop', 'groups': [u'custom-users']}\n2015-09-25 14:42:53,970 - User['custom-hbase'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,971 - User['custom-tez'] {'gid': 'hadoop', 'groups': [u'custom-users']}\n2015-09-25 14:42:53,972 - User['custom-hive'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,973 - User['custom-mr'] {'gid': 
 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,973 - User['custom-accumulo'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,974 - User['custom-hcat'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,975 - User['custom-ams'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,976 - User['custom-yarn'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,977 - User['custom-falcon'] {'gid': 'hadoop', 'groups': [u'custom-users']}\n2015-09-25 14:42:53,977 - User['custom-spark'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,978 - User['custom-atlas'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,979 - User['custom-flume'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,980 - User['custom-kafka'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,981 - User['custom-zookeeper'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,982 - User['custom-mahout'] {'gid': 'hadoop
 ', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,982 - User['custom-storm'] {'gid': 'hadoop', 'groups': [u'hadoop']}\n2015-09-25 14:42:53,983 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}\n2015-09-25 14:42:53,985 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh custom-smoke /tmp/hadoop-custom-smoke,/tmp/hsperfdata_custom-smoke,/home/custom-smoke,/tmp/custom-smoke,/tmp/sqoop-custom-smoke'] {'not_if': '(test $(id -u custom-smoke) -gt 1000) || (false)'}\n2015-09-25 14:42:53,991 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh custom-smoke /tmp/hadoop-custom-smoke,/tmp/hsperfdata_custom-smoke,/home/custom-smoke,/tmp/custom-smoke,/tmp/sqoop-custom-smoke'] due to not_if\n2015-09-25 14:42:53,991 - Directory['/tmp/hbase-hbase'] {'owner': 'custom-hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}\n2015-09-25 14:42:53,992 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.
 sh'), 'mode': 0555}\n2015-09-25 14:42:53,993 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh custom-hbase /home/custom-hbase,/tmp/custom-hbase,/usr/bin/custom-hbase,/var/log/custom-hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u custom-hbase) -gt 1000) || (false)'}\n2015-09-25 14:42:53,999 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh custom-hbase /home/custom-hbase,/tmp/custom-hbase,/usr/bin/custom-hbase,/var/log/custom-hbase,/tmp/hbase-hbase'] due to not_if\n2015-09-25 14:42:54,000 - Group['custom-hdfs'] {'ignore_failures': False}\n2015-09-25 14:42:54,000 - User['custom-hdfs'] {'ignore_failures': False, 'groups': [u'hadoop', u'custom-hdfs']}\n2015-09-25 14:42:54,001 - Directory['/etc/hadoop'] {'mode': 0755}\n2015-09-25 14:42:54,019 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'root', 'group': 'hadoop'}\n2015-09-25 14:42:54,019 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'custom-
 hdfs', 'group': 'hadoop', 'mode': 0777}\n2015-09-25 14:42:54,032 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}\n2015-09-25 14:42:54,039 - Skipping Execute[('setenforce', '0')] due to not_if\n2015-09-25 14:42:54,040 - Directory['/grid/0/log/hadoop'] {'owner': 'root', 'mode': 0775, 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}\n2015-09-25 14:42:54,043 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True, 'cd_access': 'a'}\n2015-09-25 14:42:54,043 - Directory['/tmp/hadoop-custom-hdfs'] {'owner': 'custom-hdfs', 'recursive': True, 'cd_access': 'a'}\n2015-09-25 14:42:54,048 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'root'}\n2015-09-25 14:42:54,051 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('h
 ealth_check.j2'), 'owner': 'root'}\n2015-09-25 14:42:54,051 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'custom-hdfs', 'group': 'hadoop', 'mode': 0644}\n2015-09-25 14:42:54,074 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'custom-hdfs'}\n2015-09-25 14:42:54,075 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}\n2015-09-25 14:42:54,076 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'custom-hdfs', 'group': 'hadoop'}\n2015-09-25 14:42:54,083 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'custom-hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}\n2015-09-25 14:42:54,089 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if'
 : 'test -d /etc/hadoop/conf', 'mode': 0755}\n2015-09-25 14:42:54,275 - Directory['/usr/hdp/current/accumulo-tracer/conf'] {'owner': 'custom-accumulo', 'group': 'hadoop', 'recursive': True, 'mode': 0755}\n2015-09-25 14:42:54,277 - Directory['/usr/hdp/current/accumulo-tracer/conf/server'] {'owner': 'custom-accumulo', 'group': 'hadoop', 'recursive': True, 'mode': 0700}\n2015-09-25 14:42:54,278 - XmlConfig['accumulo-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/accumulo-tracer/conf/server', 'mode': 0600, 'configuration_attributes': {}, 'owner': 'custom-accumulo', 'configurations': ...}\n2015-09-25 14:42:54,292 - Generating config: /usr/hdp/current/accumulo-tracer/conf/server/accumulo-site.xml\n2015-09-25 14:42:54,293 - File['/usr/hdp/current/accumulo-tracer/conf/server/accumulo-site.xml'] {'owner': 'custom-accumulo', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0600, 'encoding': 'UTF-8'}\n2015-09-25 14:42:54,317 - Directory['/var/run/accumulo'] {'owner': 'cu
 stom-accumulo', 'group': 'hadoop', 'recursive': True}\n2015-09-25 14:42:54,318 - Directory['/grid/0/log/accumulo'] {'owner': 'custom-accumulo', 'group': 'hadoop', 'recursive': True}\n2015-09-25 14:42:54,323 - File['/usr/hdp/current/accumulo-tracer/conf/server/accumulo-env.sh'] {'content': InlineTemplate(...), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': 0644}\n2015-09-25 14:42:54,324 - PropertiesFile['/usr/hdp/current/accumulo-tracer/conf/server/client.conf'] {'owner': 'custom-accumulo', 'group': 'hadoop', 'properties': {'instance.zookeeper.host': u'ambari-ooziehive-r1-2.novalocal:2181,ambari-ooziehive-r1-3.novalocal:2181,ambari-ooziehive-r1-5.novalocal:2181', 'instance.name': u'hdp-accumulo-instance', 'instance.rpc.sasl.enabled': True, 'instance.zookeeper.timeout': u'30s'}}\n2015-09-25 14:42:54,329 - Generating properties file: /usr/hdp/current/accumulo-tracer/conf/server/client.conf\n2015-09-25 14:42:54,329 - File['/usr/hdp/current/accumulo-tracer/conf/server/client.conf
 '] {'owner': 'custom-accumulo', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,332 - Writing File['/usr/hdp/current/accumulo-tracer/conf/server/client.conf'] because contents don't match\n2015-09-25 14:42:54,333 - File['/usr/hdp/current/accumulo-tracer/conf/server/log4j.properties'] {'content': ..., 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': 0644}\n2015-09-25 14:42:54,333 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/auditLog.xml'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,337 - File['/usr/hdp/current/accumulo-tracer/conf/server/auditLog.xml'] {'content': Template('auditLog.xml.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,337 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/generic_logger.xml'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,341 - File['/usr/hdp/current
 /accumulo-tracer/conf/server/generic_logger.xml'] {'content': Template('generic_logger.xml.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,342 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/monitor_logger.xml'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,344 - File['/usr/hdp/current/accumulo-tracer/conf/server/monitor_logger.xml'] {'content': Template('monitor_logger.xml.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,345 - File['/usr/hdp/current/accumulo-tracer/conf/server/accumulo-metrics.xml'] {'content': StaticFile('accumulo-metrics.xml'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': 0644}\n2015-09-25 14:42:54,346 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/tracers'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,348 - File['/usr/hdp/current/accumulo-tracer/conf/serv
 er/tracers'] {'content': Template('tracers.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,349 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/gc'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,351 - File['/usr/hdp/current/accumulo-tracer/conf/server/gc'] {'content': Template('gc.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,352 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/monitor'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,354 - File['/usr/hdp/current/accumulo-tracer/conf/server/monitor'] {'content': Template('monitor.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,355 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/slaves'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,357 - Fi
 le['/usr/hdp/current/accumulo-tracer/conf/server/slaves'] {'content': Template('slaves.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,357 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/masters'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,359 - File['/usr/hdp/current/accumulo-tracer/conf/server/masters'] {'content': Template('masters.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,360 - TemplateConfig['/usr/hdp/current/accumulo-tracer/conf/server/hadoop-metrics2-accumulo.properties'] {'owner': 'custom-accumulo', 'template_tag': None, 'group': 'hadoop'}\n2015-09-25 14:42:54,368 - File['/usr/hdp/current/accumulo-tracer/conf/server/hadoop-metrics2-accumulo.properties'] {'content': Template('hadoop-metrics2-accumulo.properties.j2'), 'owner': 'custom-accumulo', 'group': 'hadoop', 'mode': None}\n2015-09-25 14:42:54,369 - Execute['/usr/bin/kinit -k
 t /etc/security/keytabs/accumulo.headless.keytab custom-accumulo@EXAMPLE.COM; ACCUMULO_CONF_DIR=/usr/hdp/current/accumulo-tracer/conf/server /usr/hdp/current/accumulo-client/bin/accumulo init --reset-security --user custom-accumulo@EXAMPLE.COM --password NA >/grid/0/log/accumulo/accumulo-reset.out 2>/grid/0/log/accumulo/accumulo-reset.err'] {'not_if': 'ambari-sudo.sh su custom-accumulo -l -s /bin/bash -c \\'/usr/bin/kinit -kt /etc/security/keytabs/accumulo.headless.keytab custom-accumulo@EXAMPLE.COM; ACCUMULO_CONF_DIR=/usr/hdp/current/accumulo-tracer/conf/server /usr/hdp/current/accumulo-client/bin/accumulo shell -e \"userpermissions -u custom-accumulo@EXAMPLE.COM\" | grep System.CREATE_TABLE\\'', 'user': 'custom-accumulo'}",
> {code}
> 
> tserver log contains the following exceptions
> {code}
> 2015-09-25 14:29:38,821 [tserver.TabletServer] INFO : Started replication service on ambari-ooziehive-r1-2.novalocal:10002
> 2015-09-25 14:29:55,489 [server.TThreadPoolServer] ERROR: Error occurred during processing of message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
> 	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
> 	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
> 	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:360)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> 	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
> 	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.transport.TTransportException
> 	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> 	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> 	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
> 	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
> 	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
> 	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
> 	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
> 	... 11 more
> 2015-09-25 14:30:01,812 [tserver.TabletServer] INFO : Loading tablet !0<;~
> 2015-09-25 14:30:01,894 [tserver.TabletServer] INFO : ambari-ooziehive-r1-2.novalocal:9997: got assignment from master: !0<;~
> 2015-09-25 14:30:02,833 [util.MetadataTableUtil] INFO : Scanning logging entries for !0<;~
> 2015-09-25 14:30:02,862 [util.MetadataTableUtil] INFO : Scanning metadata for logs used for tablet !0<;~
> 2015-09-25 14:30:02,924 [util.MetadataTableUtil] INFO : Returning logs [] for extent !0<;~
> 2015-09-25 14:30:34,637 [server.TThreadPoolServer] ERROR: Error occurred during processing of message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
> 	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
> 	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
> 	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:360)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> 	at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
> 	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
> 	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
> 	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
> 	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
> 	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
> 	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
> 	... 11 more
> {code}
> 
> Live (another 48 hours) cluster which happened fail:
> 172.22.90.201	ambari-ooziehive-r1-5.novalocal	ambari-ooziehive-r1-5
> 172.22.90.200	ambari-ooziehive-r1-2.novalocal	ambari-ooziehive-r1-2
> 172.22.90.198	ambari-ooziehive-r1-3.novalocal	ambari-ooziehive-r1-3
> 172.22.90.197	ambari-ooziehive-r1-4.novalocal	ambari-ooziehive-r1-4
> 172.22.90.199	ambari-ooziehive-r1-1.novalocal	ambari-ooziehive-r1-1
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/functions/__init__.py 1998f69 
>   ambari-common/src/main/python/resource_management/libraries/functions/get_bare_principal.py PRE-CREATION 
>   ambari-server/src/main/resources/common-services/ACCUMULO/1.6.1.2.2.0/package/scripts/params.py ca8cebe 
>   ambari-server/src/main/resources/common-services/STORM/0.9.1.2.1/package/scripts/params_linux.py 2349a92 
>   ambari-server/src/main/resources/stacks/HDP/2.3/services/ACCUMULO/kerberos.json 73aaf3d 
> 
> Diff: https://reviews.apache.org/r/38951/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>