You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by Ophir Etzion <op...@foursquare.com> on 2016/11/30 18:58:13 UTC

failure in the STOP command

Hi,

I hope I'm writing to the correct mailing list. please direct me elsewhere
if this is not the correct place to write to.

I've written a simple custom slider application and the STOP script fails
due to what seems like a slider issue of not setting the PYTHONPATH when
running the stop command.

I will probably debug to see what goes on in CustomServiceOrchestrator and
why it doesn't set the env variables there but I'll only do it in a couple
of weeks.
I wanted to ask if anyone noticed something like this before I look into it
further.

in the agent log it looks like this:

INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
{u'roleCommand': u'STOP', u'clusterName':
u'enable-presto-worker_cluster_a', u'componentName': u'NODE', u'hostname':
u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
u'serviceName': u'enable-presto-worker_cluster_a', u'role': u'NODE',
u'commandParams': {u'record_config': u'true', u'service_package_folder':
u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
u'scripts/enable_presto_worker_component.py', u'schema_version': u'2.0',
u'command_timeout': u'600', u'script_type': u'PYTHON'}, u'taskId': 5,
u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
u'configurations': {u'global': {u'security_enabled': u'false',
u'app_container_id': u'container_e468_1479830316320_64974_01_000091',
u'data_dir': u'/data/appdata/enable_presto_worker/data', u'app_name':
u'enable_presto_worker.py', u'app_root': u'${AGENT_WORK_ROOT}/app/install',
u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2', u'pid_file':
u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port': u'9990'}}}
INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 - Storing
applied config: {u'global': {u'app_container_id':
u'container_e468_1479830316320_64974_01_000091',
             u'app_container_tag': u'2',
             u'app_input_conf_dir':
u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/propagatedconf',
             u'app_install_dir':
u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/app/install',
             u'app_log_dir':
u'/data/log/hadoop-yarn/container/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091',
             u'app_name': u'enable_presto_worker.py',
             u'app_pid_dir':
u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/app/run',
             u'app_root':
u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/app/install',
             u'data_dir': u'/data/appdata/enable_presto_worker/data',
             u'pid_file':
u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/app/run/component.pid',
             u'security_enabled': u'false',
             u'state_monitor_port': u'9990'}}
INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
 /usr/bin/python -S
/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/filecache/11/enable_presto_worker.zip/package/scripts/enable_presto_worker_component.py
STOP
/export/hda3/data/log/hadoop-yarn/container/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/command-5.json
/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/filecache/11/enable_presto_worker.zip/package
/export/hda3/data/log/hadoop-yarn/container/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/structured-out-5.json
INFO
/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091
INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command output:
 err: Traceback (most recent call last):
  File
"/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/filecache/11/enable_presto_worker.zip/package/scripts/enable_presto_worker_component.py",
line 23, in <module>
    from resource_management import *

Re: failure in the STOP command

Posted by Ophir Etzion <op...@foursquare.com>.
and it doesn't seem to set a PYTHONPATH

On Mon, Dec 5, 2016 at 5:14 PM, Ophir Etzion <op...@foursquare.com> wrote:

> it still doesn't work
>
> the error becomes:
> INFO 2016-12-05 22:11:10,947 PythonExecutor.py:97 - stop command output:
>  err: shell-init: error retrieving current directory: getcwd: cannot access
> parent directories: No such file or directory
>
> On Fri, Dec 2, 2016 at 2:58 PM, Gour Saha <gs...@hortonworks.com> wrote:
>
>> Billie, this is a good catch.
>>
>> Ophir, I think you should make this small change and try your app stop
>> again to see if it works.
>>
>> -Gour
>>
>> On 12/2/16, 10:13 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:
>>
>> >This subprocess.Popen does appear to be missing an env=env parameter:
>> >https://github.com/apache/incubator-slider/blob/develop/sli
>> der-agent/src/m
>> >ain/python/agent/PythonExecutor.py#L153
>> >
>> >On Fri, Dec 2, 2016 at 9:30 AM, Ophir Etzion <op...@foursquare.com>
>> wrote:
>> >
>> >> 1. you can't see the PYTHONPATH issue. you can see there is no setting
>> >>of
>> >> the PYTHONPATH that you can see in the START command.
>> >> 2. thanks for letting me know about release_timeout_secs but for my app
>> >>I
>> >> don't care if the containers die, the stop command sends an udp packet
>> >> elsewhere.
>> >>
>> >> here is the output for START where you can see the PYTHONPATH being
>> set:
>> >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
>> >> ['/usr/bin/python',
>> >>  '-S',
>> >>
>> >>u'/export/hdk3/yarn/nm/usercache/hive/appcache/application
>> _1479830316320_
>> >> 64974/filecache/11/enable_presto_worker.zip/package/
>> >> scripts/enable_presto_worker_component.py',
>> >>  u'START',
>> >>  '/export/hda3/data/log/hadoop-yarn/container/application_
>> >> 1479830316320_64974/container_e468_1479830316320_64974_01_
>> >> 000091/command-4.json',
>> >>
>> >>'/export/hdk3/yarn/nm/usercache/hive/appcache/application_
>> 1479830316320_
>> >> 64974/filecache/11/enable_presto_worker.zip/package',
>> >>  '/export/hda3/data/log/hadoop-yarn/container/application_
>> >> 1479830316320_64974/container_e468_1479830316320_64974_01_
>> >> 000091/structured-out-4.json',
>> >>  'INFO',
>> >>
>> >>'/export/hdj3/yarn/nm/usercache/hive/appcache/application_
>> 1479830316320_
>> >> 64974/container_e468_1479830316320_64974_01_000091']
>> >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
>> >> PYTHONPATH to
>> >> /export/hdj3/yarn/nm/usercache/hive/appcache/application_
>> 1479830316320_
>> >> 64974/filecache/10/slider-agent.tar.gz/slider-agent/
>> >> jinja2:/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> application_1479830316320_64974/filecache/10/slider-
>> >> agent.tar.gz/slider-agent
>> >> INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
>> >> {'componentStatus': [],
>> >>  'reports': [{'actionId': u'4-1',
>> >>               'clusterName': u'enable-presto-worker_cluster_a',
>> >>               'exitcode': 777,
>> >>               'reportResult': True,
>> >>               'role': u'NODE',
>> >>               'roleCommand': u'START',
>> >>               'serviceName': u'enable-presto-worker_cluster_a',
>> >>               'status': 'IN_PROGRESS',
>> >>               'stderr': '',
>> >>               'stdout': "2016-11-30 17:50:32,455 -
>> >> Directory['/data/appdata/enable_presto_worker/data/var/run']
>> >>{'recursive':
>> >> True}",
>> >>               'structuredOut': '{}',
>> >>               'taskId': 4}]}
>> >>
>> >> On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com>
>> >>wrote:
>> >>
>> >> > Also keep in mind - if your application needs to run something useful
>> >> when
>> >> > the stop cmd is initiated then you need to set an appropriate value
>> to
>> >> > site.global.app_container.release_timeout_secs. Otherwise kill
>> signals
>> >> are
>> >> > sent to the agent containers via YARN (almost immediately) and the
>> >> > containers don¹t get time for graceful shutdown.
>> >> >
>> >> > -Gour
>> >> >
>> >> >
>> >> >
>> >> > On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com>
>> >>wrote:
>> >> >
>> >> > >It looks like the Traceback stack for the stop command output is
>> >> truncated
>> >> > >in the logs you pasted. I only see the first line of the Traceback:
>> >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
>> >>output:
>> >> > > err: Traceback (most recent call last):
>> >> > >  File
>> >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
>> >> application_1479830316320_
>> >> > >64974/filecache/11/enable_presto_worker.zip/package/
>> >> > >scripts/enable_presto_worker_component.py",
>> >> > >line 23, in <module>
>> >> > >    from resource_management import *
>> >> > >
>> >> > >So I cannot see the PYTHONPATH error you're talking about. If you
>> >>paste
>> >> > >the
>> >> > >entire Traceback that might tell us more.
>> >> > >
>> >> > >Billie
>> >> > >
>> >> > >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com>
>> >> > wrote:
>> >> > >
>> >> > >> it does implement a STOP command that does something useful.
>> >> > >> it fails because the PYTHONPATH isn't set like it is in different
>> >> > >>commands.
>> >> > >>
>> >> > >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gsaha@hortonworks.com
>> >
>> >> > >>wrote:
>> >> > >>
>> >> > >> > Does enable_presto_worker_component.py support/implement a STOP
>> >> > >>command?
>> >> > >> >
>> >> > >> > Does your application need to run something useful when the stop
>> >>cmd
>> >> > >>is
>> >> > >> > initiated?
>> >> > >> >
>> >> > >> > -Gour
>> >> > >> >
>> >> > >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com>
>> >>wrote:
>> >> > >> >
>> >> > >> > >Hi,
>> >> > >> > >
>> >> > >> > >I hope I'm writing to the correct mailing list. please direct
>> me
>> >> > >> elsewhere
>> >> > >> > >if this is not the correct place to write to.
>> >> > >> > >
>> >> > >> > >I've written a simple custom slider application and the STOP
>> >>script
>> >> > >> fails
>> >> > >> > >due to what seems like a slider issue of not setting the
>> >>PYTHONPATH
>> >> > >>when
>> >> > >> > >running the stop command.
>> >> > >> > >
>> >> > >> > >I will probably debug to see what goes on in
>> >> > >>CustomServiceOrchestrator
>> >> > >> and
>> >> > >> > >why it doesn't set the env variables there but I'll only do it
>> >>in a
>> >> > >> couple
>> >> > >> > >of weeks.
>> >> > >> > >I wanted to ask if anyone noticed something like this before I
>> >>look
>> >> > >>into
>> >> > >> > >it
>> >> > >> > >further.
>> >> > >> > >
>> >> > >> > >in the agent log it looks like this:
>> >> > >> > >
>> >> > >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running
>> >>command:
>> >> > >> > >{u'roleCommand': u'STOP', u'clusterName':
>> >> > >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
>> >> > >> > u'hostname':
>> >> > >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams':
>> >>{u'java_home':
>> >> > >> > >u'/data/loko/infrastructure-jdk8/current/bin/',
>> u'container_id':
>> >> > >> > >u'container_e468_1479830316320_64974_01_000091'},
>> >>u'commandType':
>> >> > >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart':
>> >>u'false'},
>> >> > >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role':
>> >> u'NODE',
>> >> > >> > >u'commandParams': {u'record_config': u'true',
>> >> > >>u'service_package_folder':
>> >> > >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
>> >> > >> > >u'scripts/enable_presto_worker_component.py',
>> u'schema_version':
>> >> > >> u'2.0',
>> >> > >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'},
>> >>u'taskId':
>> >> 5,
>> >> > >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers':
>> >>[],
>> >> > >> > >u'configurations': {u'global': {u'security_enabled': u'false',
>> >> > >> > >u'app_container_id': u'container_e468_
>> >> 1479830316320_64974_01_000091'
>> >> > ,
>> >> > >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
>> >> > u'app_name':
>> >> > >> > >u'enable_presto_worker.py', u'app_root':
>> >> > >> > >u'${AGENT_WORK_ROOT}/app/install',
>> >> > >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
>> >> > >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
>> >> > >>u'pid_file':
>> >> > >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid',
>> u'app_install_dir':
>> >> > >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
>> >> > >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
>> >> > >>u'9990'}}}
>> >> > >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329
>> -
>> >> > >>Storing
>> >> > >> > >applied config: {u'global': {u'app_container_id':
>> >> > >> > >u'container_e468_1479830316320_64974_01_000091',
>> >> > >> > >             u'app_container_tag': u'2',
>> >> > >> > >             u'app_input_conf_dir':
>> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_6
>> >> > >> >
>> >>>4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
>> >> > >> > >             u'app_install_dir':
>> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_6
>> >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/
>> install',
>> >> > >> > >             u'app_log_dir':
>> >> > >> > >u'/data/log/hadoop-yarn/container/application_
>> >> > >> > 1479830316320_64974/containe
>> >> > >> > >r_e468_1479830316320_64974_01_000091',
>> >> > >> > >             u'app_name': u'enable_presto_worker.py',
>> >> > >> > >             u'app_pid_dir':
>> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_6
>> >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
>> >> > >> > >             u'app_root':
>> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_6
>> >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/
>> install',
>> >> > >> > >             u'data_dir': u'/data/appdata/enable_presto_
>> >> > worker/data',
>> >> > >> > >             u'pid_file':
>> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_6
>> >> > >> > >4974/container_e468_1479830316320_64974_01_000091/
>> >> > >> app/run/component.pid',
>> >> > >> > >             u'security_enabled': u'false',
>> >> > >> > >             u'state_monitor_port': u'9990'}}
>> >> > >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command
>> >>str:
>> >> > >> > > /usr/bin/python -S
>> >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_649
>> >> > >> > >74/filecache/11/enable_presto_worker.zip/package/
>> >> > >> > scripts/enable_presto_wor
>> >> > >> > >ker_component.py
>> >> > >> > >STOP
>> >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
>> >> > >> > 1479830316320_6497
>> >> > >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
>> >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_649
>> >> > >> > >74/filecache/11/enable_presto_worker.zip/package
>> >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
>> >> > >> > 1479830316320_6497
>> >> > >> > >4/container_e468_1479830316320_64974_01_000091/
>> >> structured-out-5.json
>> >> > >> > >INFO
>> >> > >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_649
>> >> > >> > >74/container_e468_1479830316320_64974_01_000091
>> >> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop
>> command
>> >> > >>output:
>> >> > >> > > err: Traceback (most recent call last):
>> >> > >> > >  File
>> >> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
>> >> > >> > application_1479830316320_64
>> >> > >> > >974/filecache/11/enable_presto_worker.zip/package/
>> >> > >> > scripts/enable_presto_wo
>> >> > >> > >rker_component.py",
>> >> > >> > >line 23, in <module>
>> >> > >> > >    from resource_management import *
>> >> > >> >
>> >> > >> >
>> >> > >>
>> >> >
>> >> >
>> >>
>>
>>
>

Re: failure in the STOP command

Posted by Billie Rinaldi <bi...@gmail.com>.
On Mon, Dec 5, 2016 at 2:14 PM, Ophir Etzion <op...@foursquare.com> wrote:

> it still doesn't work
>
> the error becomes:
> INFO 2016-12-05 22:11:10,947 PythonExecutor.py:97 - stop command output:
>  err: shell-init: error retrieving current directory: getcwd: cannot access
> parent directories: No such file or directory
>

This could be the race condition Gour was talking about, where the
container gets cleaned up before the agent has a chance to stop gracefully.


>
> On Fri, Dec 2, 2016 at 2:58 PM, Gour Saha <gs...@hortonworks.com> wrote:
>
> > Billie, this is a good catch.
> >
> > Ophir, I think you should make this small change and try your app stop
> > again to see if it works.
> >
> > -Gour
> >
> > On 12/2/16, 10:13 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:
> >
> > >This subprocess.Popen does appear to be missing an env=env parameter:
> > >https://github.com/apache/incubator-slider/blob/develop/
> > slider-agent/src/m
> > >ain/python/agent/PythonExecutor.py#L153
> > >
> > >On Fri, Dec 2, 2016 at 9:30 AM, Ophir Etzion <op...@foursquare.com>
> > wrote:
> > >
> > >> 1. you can't see the PYTHONPATH issue. you can see there is no setting
> > >>of
> > >> the PYTHONPATH that you can see in the START command.
> > >> 2. thanks for letting me know about release_timeout_secs but for my
> app
> > >>I
> > >> don't care if the containers die, the stop command sends an udp packet
> > >> elsewhere.
> > >>
> > >> here is the output for START where you can see the PYTHONPATH being
> set:
> > >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
> > >> ['/usr/bin/python',
> > >>  '-S',
> > >>
> > >>u'/export/hdk3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_
> > >> 64974/filecache/11/enable_presto_worker.zip/package/
> > >> scripts/enable_presto_worker_component.py',
> > >>  u'START',
> > >>  '/export/hda3/data/log/hadoop-yarn/container/application_
> > >> 1479830316320_64974/container_e468_1479830316320_64974_01_
> > >> 000091/command-4.json',
> > >>
> > >>'/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> > >> 64974/filecache/11/enable_presto_worker.zip/package',
> > >>  '/export/hda3/data/log/hadoop-yarn/container/application_
> > >> 1479830316320_64974/container_e468_1479830316320_64974_01_
> > >> 000091/structured-out-4.json',
> > >>  'INFO',
> > >>
> > >>'/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> > >> 64974/container_e468_1479830316320_64974_01_000091']
> > >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
> > >> PYTHONPATH to
> > >> /export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> > >> 64974/filecache/10/slider-agent.tar.gz/slider-agent/
> > >> jinja2:/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> application_1479830316320_64974/filecache/10/slider-
> > >> agent.tar.gz/slider-agent
> > >> INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
> > >> {'componentStatus': [],
> > >>  'reports': [{'actionId': u'4-1',
> > >>               'clusterName': u'enable-presto-worker_cluster_a',
> > >>               'exitcode': 777,
> > >>               'reportResult': True,
> > >>               'role': u'NODE',
> > >>               'roleCommand': u'START',
> > >>               'serviceName': u'enable-presto-worker_cluster_a',
> > >>               'status': 'IN_PROGRESS',
> > >>               'stderr': '',
> > >>               'stdout': "2016-11-30 17:50:32,455 -
> > >> Directory['/data/appdata/enable_presto_worker/data/var/run']
> > >>{'recursive':
> > >> True}",
> > >>               'structuredOut': '{}',
> > >>               'taskId': 4}]}
> > >>
> > >> On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com>
> > >>wrote:
> > >>
> > >> > Also keep in mind - if your application needs to run something
> useful
> > >> when
> > >> > the stop cmd is initiated then you need to set an appropriate value
> to
> > >> > site.global.app_container.release_timeout_secs. Otherwise kill
> > signals
> > >> are
> > >> > sent to the agent containers via YARN (almost immediately) and the
> > >> > containers don¹t get time for graceful shutdown.
> > >> >
> > >> > -Gour
> > >> >
> > >> >
> > >> >
> > >> > On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com>
> > >>wrote:
> > >> >
> > >> > >It looks like the Traceback stack for the stop command output is
> > >> truncated
> > >> > >in the logs you pasted. I only see the first line of the Traceback:
> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
> > >>output:
> > >> > > err: Traceback (most recent call last):
> > >> > >  File
> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> application_1479830316320_
> > >> > >64974/filecache/11/enable_presto_worker.zip/package/
> > >> > >scripts/enable_presto_worker_component.py",
> > >> > >line 23, in <module>
> > >> > >    from resource_management import *
> > >> > >
> > >> > >So I cannot see the PYTHONPATH error you're talking about. If you
> > >>paste
> > >> > >the
> > >> > >entire Traceback that might tell us more.
> > >> > >
> > >> > >Billie
> > >> > >
> > >> > >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <ophir@foursquare.com
> >
> > >> > wrote:
> > >> > >
> > >> > >> it does implement a STOP command that does something useful.
> > >> > >> it fails because the PYTHONPATH isn't set like it is in different
> > >> > >>commands.
> > >> > >>
> > >> > >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <
> gsaha@hortonworks.com>
> > >> > >>wrote:
> > >> > >>
> > >> > >> > Does enable_presto_worker_component.py support/implement a
> STOP
> > >> > >>command?
> > >> > >> >
> > >> > >> > Does your application need to run something useful when the
> stop
> > >>cmd
> > >> > >>is
> > >> > >> > initiated?
> > >> > >> >
> > >> > >> > -Gour
> > >> > >> >
> > >> > >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com>
> > >>wrote:
> > >> > >> >
> > >> > >> > >Hi,
> > >> > >> > >
> > >> > >> > >I hope I'm writing to the correct mailing list. please direct
> me
> > >> > >> elsewhere
> > >> > >> > >if this is not the correct place to write to.
> > >> > >> > >
> > >> > >> > >I've written a simple custom slider application and the STOP
> > >>script
> > >> > >> fails
> > >> > >> > >due to what seems like a slider issue of not setting the
> > >>PYTHONPATH
> > >> > >>when
> > >> > >> > >running the stop command.
> > >> > >> > >
> > >> > >> > >I will probably debug to see what goes on in
> > >> > >>CustomServiceOrchestrator
> > >> > >> and
> > >> > >> > >why it doesn't set the env variables there but I'll only do it
> > >>in a
> > >> > >> couple
> > >> > >> > >of weeks.
> > >> > >> > >I wanted to ask if anyone noticed something like this before I
> > >>look
> > >> > >>into
> > >> > >> > >it
> > >> > >> > >further.
> > >> > >> > >
> > >> > >> > >in the agent log it looks like this:
> > >> > >> > >
> > >> > >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running
> > >>command:
> > >> > >> > >{u'roleCommand': u'STOP', u'clusterName':
> > >> > >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
> > >> > >> > u'hostname':
> > >> > >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams':
> > >>{u'java_home':
> > >> > >> > >u'/data/loko/infrastructure-jdk8/current/bin/',
> > u'container_id':
> > >> > >> > >u'container_e468_1479830316320_64974_01_000091'},
> > >>u'commandType':
> > >> > >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart':
> > >>u'false'},
> > >> > >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role':
> > >> u'NODE',
> > >> > >> > >u'commandParams': {u'record_config': u'true',
> > >> > >>u'service_package_folder':
> > >> > >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> > >> > >> > >u'scripts/enable_presto_worker_component.py',
> > u'schema_version':
> > >> > >> u'2.0',
> > >> > >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'},
> > >>u'taskId':
> > >> 5,
> > >> > >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers':
> > >>[],
> > >> > >> > >u'configurations': {u'global': {u'security_enabled': u'false',
> > >> > >> > >u'app_container_id': u'container_e468_
> > >> 1479830316320_64974_01_000091'
> > >> > ,
> > >> > >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
> > >> > u'app_name':
> > >> > >> > >u'enable_presto_worker.py', u'app_root':
> > >> > >> > >u'${AGENT_WORK_ROOT}/app/install',
> > >> > >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
> > >> > >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
> > >> > >>u'pid_file':
> > >> > >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid',
> > u'app_install_dir':
> > >> > >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
> > >> > >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
> > >> > >>u'9990'}}}
> > >> > >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329
> -
> > >> > >>Storing
> > >> > >> > >applied config: {u'global': {u'app_container_id':
> > >> > >> > >u'container_e468_1479830316320_64974_01_000091',
> > >> > >> > >             u'app_container_tag': u'2',
> > >> > >> > >             u'app_input_conf_dir':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> >
> > >>>4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
> > >> > >> > >             u'app_install_dir':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/
> app/install',
> > >> > >> > >             u'app_log_dir':
> > >> > >> > >u'/data/log/hadoop-yarn/container/application_
> > >> > >> > 1479830316320_64974/containe
> > >> > >> > >r_e468_1479830316320_64974_01_000091',
> > >> > >> > >             u'app_name': u'enable_presto_worker.py',
> > >> > >> > >             u'app_pid_dir':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
> > >> > >> > >             u'app_root':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/
> app/install',
> > >> > >> > >             u'data_dir': u'/data/appdata/enable_presto_
> > >> > worker/data',
> > >> > >> > >             u'pid_file':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/
> > >> > >> app/run/component.pid',
> > >> > >> > >             u'security_enabled': u'false',
> > >> > >> > >             u'state_monitor_port': u'9990'}}
> > >> > >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command
> > >>str:
> > >> > >> > > /usr/bin/python -S
> > >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_649
> > >> > >> > >74/filecache/11/enable_presto_worker.zip/package/
> > >> > >> > scripts/enable_presto_wor
> > >> > >> > >ker_component.py
> > >> > >> > >STOP
> > >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> > >> > >> > 1479830316320_6497
> > >> > >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
> > >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_649
> > >> > >> > >74/filecache/11/enable_presto_worker.zip/package
> > >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> > >> > >> > 1479830316320_6497
> > >> > >> > >4/container_e468_1479830316320_64974_01_000091/
> > >> structured-out-5.json
> > >> > >> > >INFO
> > >> > >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_649
> > >> > >> > >74/container_e468_1479830316320_64974_01_000091
> > >> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop
> command
> > >> > >>output:
> > >> > >> > > err: Traceback (most recent call last):
> > >> > >> > >  File
> > >> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_64
> > >> > >> > >974/filecache/11/enable_presto_worker.zip/package/
> > >> > >> > scripts/enable_presto_wo
> > >> > >> > >rker_component.py",
> > >> > >> > >line 23, in <module>
> > >> > >> > >    from resource_management import *
> > >> > >> >
> > >> > >> >
> > >> > >>
> > >> >
> > >> >
> > >>
> >
> >
>

Re: failure in the STOP command

Posted by Ophir Etzion <op...@foursquare.com>.
it still doesn't work

the error becomes:
INFO 2016-12-05 22:11:10,947 PythonExecutor.py:97 - stop command output:
 err: shell-init: error retrieving current directory: getcwd: cannot access
parent directories: No such file or directory

On Fri, Dec 2, 2016 at 2:58 PM, Gour Saha <gs...@hortonworks.com> wrote:

> Billie, this is a good catch.
>
> Ophir, I think you should make this small change and try your app stop
> again to see if it works.
>
> -Gour
>
> On 12/2/16, 10:13 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:
>
> >This subprocess.Popen does appear to be missing an env=env parameter:
> >https://github.com/apache/incubator-slider/blob/develop/
> slider-agent/src/m
> >ain/python/agent/PythonExecutor.py#L153
> >
> >On Fri, Dec 2, 2016 at 9:30 AM, Ophir Etzion <op...@foursquare.com>
> wrote:
> >
> >> 1. you can't see the PYTHONPATH issue. you can see there is no setting
> >>of
> >> the PYTHONPATH that you can see in the START command.
> >> 2. thanks for letting me know about release_timeout_secs but for my app
> >>I
> >> don't care if the containers die, the stop command sends an udp packet
> >> elsewhere.
> >>
> >> here is the output for START where you can see the PYTHONPATH being set:
> >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
> >> ['/usr/bin/python',
> >>  '-S',
> >>
> >>u'/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> >> 64974/filecache/11/enable_presto_worker.zip/package/
> >> scripts/enable_presto_worker_component.py',
> >>  u'START',
> >>  '/export/hda3/data/log/hadoop-yarn/container/application_
> >> 1479830316320_64974/container_e468_1479830316320_64974_01_
> >> 000091/command-4.json',
> >>
> >>'/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
> >> 64974/filecache/11/enable_presto_worker.zip/package',
> >>  '/export/hda3/data/log/hadoop-yarn/container/application_
> >> 1479830316320_64974/container_e468_1479830316320_64974_01_
> >> 000091/structured-out-4.json',
> >>  'INFO',
> >>
> >>'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_
> >> 64974/container_e468_1479830316320_64974_01_000091']
> >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
> >> PYTHONPATH to
> >> /export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_
> >> 64974/filecache/10/slider-agent.tar.gz/slider-agent/
> >> jinja2:/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> application_1479830316320_64974/filecache/10/slider-
> >> agent.tar.gz/slider-agent
> >> INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
> >> {'componentStatus': [],
> >>  'reports': [{'actionId': u'4-1',
> >>               'clusterName': u'enable-presto-worker_cluster_a',
> >>               'exitcode': 777,
> >>               'reportResult': True,
> >>               'role': u'NODE',
> >>               'roleCommand': u'START',
> >>               'serviceName': u'enable-presto-worker_cluster_a',
> >>               'status': 'IN_PROGRESS',
> >>               'stderr': '',
> >>               'stdout': "2016-11-30 17:50:32,455 -
> >> Directory['/data/appdata/enable_presto_worker/data/var/run']
> >>{'recursive':
> >> True}",
> >>               'structuredOut': '{}',
> >>               'taskId': 4}]}
> >>
> >> On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com>
> >>wrote:
> >>
> >> > Also keep in mind - if your application needs to run something useful
> >> when
> >> > the stop cmd is initiated then you need to set an appropriate value to
> >> > site.global.app_container.release_timeout_secs. Otherwise kill
> signals
> >> are
> >> > sent to the agent containers via YARN (almost immediately) and the
> >> > containers don¹t get time for graceful shutdown.
> >> >
> >> > -Gour
> >> >
> >> >
> >> >
> >> > On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com>
> >>wrote:
> >> >
> >> > >It looks like the Traceback stack for the stop command output is
> >> truncated
> >> > >in the logs you pasted. I only see the first line of the Traceback:
> >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
> >>output:
> >> > > err: Traceback (most recent call last):
> >> > >  File
> >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> application_1479830316320_
> >> > >64974/filecache/11/enable_presto_worker.zip/package/
> >> > >scripts/enable_presto_worker_component.py",
> >> > >line 23, in <module>
> >> > >    from resource_management import *
> >> > >
> >> > >So I cannot see the PYTHONPATH error you're talking about. If you
> >>paste
> >> > >the
> >> > >entire Traceback that might tell us more.
> >> > >
> >> > >Billie
> >> > >
> >> > >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com>
> >> > wrote:
> >> > >
> >> > >> it does implement a STOP command that does something useful.
> >> > >> it fails because the PYTHONPATH isn't set like it is in different
> >> > >>commands.
> >> > >>
> >> > >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com>
> >> > >>wrote:
> >> > >>
> >> > >> > Does enable_presto_worker_component.py support/implement a STOP
> >> > >>command?
> >> > >> >
> >> > >> > Does your application need to run something useful when the stop
> >>cmd
> >> > >>is
> >> > >> > initiated?
> >> > >> >
> >> > >> > -Gour
> >> > >> >
> >> > >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com>
> >>wrote:
> >> > >> >
> >> > >> > >Hi,
> >> > >> > >
> >> > >> > >I hope I'm writing to the correct mailing list. please direct me
> >> > >> elsewhere
> >> > >> > >if this is not the correct place to write to.
> >> > >> > >
> >> > >> > >I've written a simple custom slider application and the STOP
> >>script
> >> > >> fails
> >> > >> > >due to what seems like a slider issue of not setting the
> >>PYTHONPATH
> >> > >>when
> >> > >> > >running the stop command.
> >> > >> > >
> >> > >> > >I will probably debug to see what goes on in
> >> > >>CustomServiceOrchestrator
> >> > >> and
> >> > >> > >why it doesn't set the env variables there but I'll only do it
> >>in a
> >> > >> couple
> >> > >> > >of weeks.
> >> > >> > >I wanted to ask if anyone noticed something like this before I
> >>look
> >> > >>into
> >> > >> > >it
> >> > >> > >further.
> >> > >> > >
> >> > >> > >in the agent log it looks like this:
> >> > >> > >
> >> > >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running
> >>command:
> >> > >> > >{u'roleCommand': u'STOP', u'clusterName':
> >> > >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
> >> > >> > u'hostname':
> >> > >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams':
> >>{u'java_home':
> >> > >> > >u'/data/loko/infrastructure-jdk8/current/bin/',
> u'container_id':
> >> > >> > >u'container_e468_1479830316320_64974_01_000091'},
> >>u'commandType':
> >> > >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart':
> >>u'false'},
> >> > >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role':
> >> u'NODE',
> >> > >> > >u'commandParams': {u'record_config': u'true',
> >> > >>u'service_package_folder':
> >> > >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> >> > >> > >u'scripts/enable_presto_worker_component.py',
> u'schema_version':
> >> > >> u'2.0',
> >> > >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'},
> >>u'taskId':
> >> 5,
> >> > >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers':
> >>[],
> >> > >> > >u'configurations': {u'global': {u'security_enabled': u'false',
> >> > >> > >u'app_container_id': u'container_e468_
> >> 1479830316320_64974_01_000091'
> >> > ,
> >> > >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
> >> > u'app_name':
> >> > >> > >u'enable_presto_worker.py', u'app_root':
> >> > >> > >u'${AGENT_WORK_ROOT}/app/install',
> >> > >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
> >> > >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
> >> > >>u'pid_file':
> >> > >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid',
> u'app_install_dir':
> >> > >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
> >> > >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
> >> > >>u'9990'}}}
> >> > >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 -
> >> > >>Storing
> >> > >> > >applied config: {u'global': {u'app_container_id':
> >> > >> > >u'container_e468_1479830316320_64974_01_000091',
> >> > >> > >             u'app_container_tag': u'2',
> >> > >> > >             u'app_input_conf_dir':
> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_6
> >> > >> >
> >>>4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
> >> > >> > >             u'app_install_dir':
> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_6
> >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> >> > >> > >             u'app_log_dir':
> >> > >> > >u'/data/log/hadoop-yarn/container/application_
> >> > >> > 1479830316320_64974/containe
> >> > >> > >r_e468_1479830316320_64974_01_000091',
> >> > >> > >             u'app_name': u'enable_presto_worker.py',
> >> > >> > >             u'app_pid_dir':
> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_6
> >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
> >> > >> > >             u'app_root':
> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_6
> >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> >> > >> > >             u'data_dir': u'/data/appdata/enable_presto_
> >> > worker/data',
> >> > >> > >             u'pid_file':
> >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_6
> >> > >> > >4974/container_e468_1479830316320_64974_01_000091/
> >> > >> app/run/component.pid',
> >> > >> > >             u'security_enabled': u'false',
> >> > >> > >             u'state_monitor_port': u'9990'}}
> >> > >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command
> >>str:
> >> > >> > > /usr/bin/python -S
> >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_649
> >> > >> > >74/filecache/11/enable_presto_worker.zip/package/
> >> > >> > scripts/enable_presto_wor
> >> > >> > >ker_component.py
> >> > >> > >STOP
> >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> >> > >> > 1479830316320_6497
> >> > >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
> >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_649
> >> > >> > >74/filecache/11/enable_presto_worker.zip/package
> >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> >> > >> > 1479830316320_6497
> >> > >> > >4/container_e468_1479830316320_64974_01_000091/
> >> structured-out-5.json
> >> > >> > >INFO
> >> > >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_649
> >> > >> > >74/container_e468_1479830316320_64974_01_000091
> >> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
> >> > >>output:
> >> > >> > > err: Traceback (most recent call last):
> >> > >> > >  File
> >> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> > >> > application_1479830316320_64
> >> > >> > >974/filecache/11/enable_presto_worker.zip/package/
> >> > >> > scripts/enable_presto_wo
> >> > >> > >rker_component.py",
> >> > >> > >line 23, in <module>
> >> > >> > >    from resource_management import *
> >> > >> >
> >> > >> >
> >> > >>
> >> >
> >> >
> >>
>
>

Re: failure in the STOP command

Posted by Gour Saha <gs...@hortonworks.com>.
Billie, this is a good catch.

Ophir, I think you should make this small change and try your app stop
again to see if it works.

-Gour

On 12/2/16, 10:13 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:

>This subprocess.Popen does appear to be missing an env=env parameter:
>https://github.com/apache/incubator-slider/blob/develop/slider-agent/src/m
>ain/python/agent/PythonExecutor.py#L153
>
>On Fri, Dec 2, 2016 at 9:30 AM, Ophir Etzion <op...@foursquare.com> wrote:
>
>> 1. you can't see the PYTHONPATH issue. you can see there is no setting
>>of
>> the PYTHONPATH that you can see in the START command.
>> 2. thanks for letting me know about release_timeout_secs but for my app
>>I
>> don't care if the containers die, the stop command sends an udp packet
>> elsewhere.
>>
>> here is the output for START where you can see the PYTHONPATH being set:
>> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
>> ['/usr/bin/python',
>>  '-S',
>>  
>>u'/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
>> 64974/filecache/11/enable_presto_worker.zip/package/
>> scripts/enable_presto_worker_component.py',
>>  u'START',
>>  '/export/hda3/data/log/hadoop-yarn/container/application_
>> 1479830316320_64974/container_e468_1479830316320_64974_01_
>> 000091/command-4.json',
>>  
>>'/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
>> 64974/filecache/11/enable_presto_worker.zip/package',
>>  '/export/hda3/data/log/hadoop-yarn/container/application_
>> 1479830316320_64974/container_e468_1479830316320_64974_01_
>> 000091/structured-out-4.json',
>>  'INFO',
>>  
>>'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_
>> 64974/container_e468_1479830316320_64974_01_000091']
>> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
>> PYTHONPATH to
>> /export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_
>> 64974/filecache/10/slider-agent.tar.gz/slider-agent/
>> jinja2:/export/hdj3/yarn/nm/usercache/hive/appcache/
>> application_1479830316320_64974/filecache/10/slider-
>> agent.tar.gz/slider-agent
>> INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
>> {'componentStatus': [],
>>  'reports': [{'actionId': u'4-1',
>>               'clusterName': u'enable-presto-worker_cluster_a',
>>               'exitcode': 777,
>>               'reportResult': True,
>>               'role': u'NODE',
>>               'roleCommand': u'START',
>>               'serviceName': u'enable-presto-worker_cluster_a',
>>               'status': 'IN_PROGRESS',
>>               'stderr': '',
>>               'stdout': "2016-11-30 17:50:32,455 -
>> Directory['/data/appdata/enable_presto_worker/data/var/run']
>>{'recursive':
>> True}",
>>               'structuredOut': '{}',
>>               'taskId': 4}]}
>>
>> On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com>
>>wrote:
>>
>> > Also keep in mind - if your application needs to run something useful
>> when
>> > the stop cmd is initiated then you need to set an appropriate value to
>> > site.global.app_container.release_timeout_secs. Otherwise kill signals
>> are
>> > sent to the agent containers via YARN (almost immediately) and the
>> > containers don¹t get time for graceful shutdown.
>> >
>> > -Gour
>> >
>> >
>> >
>> > On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com>
>>wrote:
>> >
>> > >It looks like the Traceback stack for the stop command output is
>> truncated
>> > >in the logs you pasted. I only see the first line of the Traceback:
>> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
>>output:
>> > > err: Traceback (most recent call last):
>> > >  File
>> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
>> application_1479830316320_
>> > >64974/filecache/11/enable_presto_worker.zip/package/
>> > >scripts/enable_presto_worker_component.py",
>> > >line 23, in <module>
>> > >    from resource_management import *
>> > >
>> > >So I cannot see the PYTHONPATH error you're talking about. If you
>>paste
>> > >the
>> > >entire Traceback that might tell us more.
>> > >
>> > >Billie
>> > >
>> > >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com>
>> > wrote:
>> > >
>> > >> it does implement a STOP command that does something useful.
>> > >> it fails because the PYTHONPATH isn't set like it is in different
>> > >>commands.
>> > >>
>> > >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com>
>> > >>wrote:
>> > >>
>> > >> > Does enable_presto_worker_component.py support/implement a STOP
>> > >>command?
>> > >> >
>> > >> > Does your application need to run something useful when the stop
>>cmd
>> > >>is
>> > >> > initiated?
>> > >> >
>> > >> > -Gour
>> > >> >
>> > >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com>
>>wrote:
>> > >> >
>> > >> > >Hi,
>> > >> > >
>> > >> > >I hope I'm writing to the correct mailing list. please direct me
>> > >> elsewhere
>> > >> > >if this is not the correct place to write to.
>> > >> > >
>> > >> > >I've written a simple custom slider application and the STOP
>>script
>> > >> fails
>> > >> > >due to what seems like a slider issue of not setting the
>>PYTHONPATH
>> > >>when
>> > >> > >running the stop command.
>> > >> > >
>> > >> > >I will probably debug to see what goes on in
>> > >>CustomServiceOrchestrator
>> > >> and
>> > >> > >why it doesn't set the env variables there but I'll only do it
>>in a
>> > >> couple
>> > >> > >of weeks.
>> > >> > >I wanted to ask if anyone noticed something like this before I
>>look
>> > >>into
>> > >> > >it
>> > >> > >further.
>> > >> > >
>> > >> > >in the agent log it looks like this:
>> > >> > >
>> > >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running
>>command:
>> > >> > >{u'roleCommand': u'STOP', u'clusterName':
>> > >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
>> > >> > u'hostname':
>> > >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams':
>>{u'java_home':
>> > >> > >u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
>> > >> > >u'container_e468_1479830316320_64974_01_000091'},
>>u'commandType':
>> > >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart':
>>u'false'},
>> > >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role':
>> u'NODE',
>> > >> > >u'commandParams': {u'record_config': u'true',
>> > >>u'service_package_folder':
>> > >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
>> > >> > >u'scripts/enable_presto_worker_component.py', u'schema_version':
>> > >> u'2.0',
>> > >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'},
>>u'taskId':
>> 5,
>> > >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers':
>>[],
>> > >> > >u'configurations': {u'global': {u'security_enabled': u'false',
>> > >> > >u'app_container_id': u'container_e468_
>> 1479830316320_64974_01_000091'
>> > ,
>> > >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
>> > u'app_name':
>> > >> > >u'enable_presto_worker.py', u'app_root':
>> > >> > >u'${AGENT_WORK_ROOT}/app/install',
>> > >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
>> > >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
>> > >>u'pid_file':
>> > >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
>> > >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
>> > >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
>> > >>u'9990'}}}
>> > >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 -
>> > >>Storing
>> > >> > >applied config: {u'global': {u'app_container_id':
>> > >> > >u'container_e468_1479830316320_64974_01_000091',
>> > >> > >             u'app_container_tag': u'2',
>> > >> > >             u'app_input_conf_dir':
>> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_6
>> > >> > 
>>>4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
>> > >> > >             u'app_install_dir':
>> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_6
>> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
>> > >> > >             u'app_log_dir':
>> > >> > >u'/data/log/hadoop-yarn/container/application_
>> > >> > 1479830316320_64974/containe
>> > >> > >r_e468_1479830316320_64974_01_000091',
>> > >> > >             u'app_name': u'enable_presto_worker.py',
>> > >> > >             u'app_pid_dir':
>> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_6
>> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
>> > >> > >             u'app_root':
>> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_6
>> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
>> > >> > >             u'data_dir': u'/data/appdata/enable_presto_
>> > worker/data',
>> > >> > >             u'pid_file':
>> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_6
>> > >> > >4974/container_e468_1479830316320_64974_01_000091/
>> > >> app/run/component.pid',
>> > >> > >             u'security_enabled': u'false',
>> > >> > >             u'state_monitor_port': u'9990'}}
>> > >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command
>>str:
>> > >> > > /usr/bin/python -S
>> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_649
>> > >> > >74/filecache/11/enable_presto_worker.zip/package/
>> > >> > scripts/enable_presto_wor
>> > >> > >ker_component.py
>> > >> > >STOP
>> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
>> > >> > 1479830316320_6497
>> > >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
>> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_649
>> > >> > >74/filecache/11/enable_presto_worker.zip/package
>> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
>> > >> > 1479830316320_6497
>> > >> > >4/container_e468_1479830316320_64974_01_000091/
>> structured-out-5.json
>> > >> > >INFO
>> > >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_649
>> > >> > >74/container_e468_1479830316320_64974_01_000091
>> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
>> > >>output:
>> > >> > > err: Traceback (most recent call last):
>> > >> > >  File
>> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
>> > >> > application_1479830316320_64
>> > >> > >974/filecache/11/enable_presto_worker.zip/package/
>> > >> > scripts/enable_presto_wo
>> > >> > >rker_component.py",
>> > >> > >line 23, in <module>
>> > >> > >    from resource_management import *
>> > >> >
>> > >> >
>> > >>
>> >
>> >
>>


Re: failure in the STOP command

Posted by Billie Rinaldi <bi...@gmail.com>.
This subprocess.Popen does appear to be missing an env=env parameter:
https://github.com/apache/incubator-slider/blob/develop/slider-agent/src/main/python/agent/PythonExecutor.py#L153

On Fri, Dec 2, 2016 at 9:30 AM, Ophir Etzion <op...@foursquare.com> wrote:

> 1. you can't see the PYTHONPATH issue. you can see there is no setting of
> the PYTHONPATH that you can see in the START command.
> 2. thanks for letting me know about release_timeout_secs but for my app I
> don't care if the containers die, the stop command sends an udp packet
> elsewhere.
>
> here is the output for START where you can see the PYTHONPATH being set:
> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
> ['/usr/bin/python',
>  '-S',
>  u'/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
> 64974/filecache/11/enable_presto_worker.zip/package/
> scripts/enable_presto_worker_component.py',
>  u'START',
>  '/export/hda3/data/log/hadoop-yarn/container/application_
> 1479830316320_64974/container_e468_1479830316320_64974_01_
> 000091/command-4.json',
>  '/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
> 64974/filecache/11/enable_presto_worker.zip/package',
>  '/export/hda3/data/log/hadoop-yarn/container/application_
> 1479830316320_64974/container_e468_1479830316320_64974_01_
> 000091/structured-out-4.json',
>  'INFO',
>  '/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_
> 64974/container_e468_1479830316320_64974_01_000091']
> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
> PYTHONPATH to
> /export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_
> 64974/filecache/10/slider-agent.tar.gz/slider-agent/
> jinja2:/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_64974/filecache/10/slider-
> agent.tar.gz/slider-agent
> INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
> {'componentStatus': [],
>  'reports': [{'actionId': u'4-1',
>               'clusterName': u'enable-presto-worker_cluster_a',
>               'exitcode': 777,
>               'reportResult': True,
>               'role': u'NODE',
>               'roleCommand': u'START',
>               'serviceName': u'enable-presto-worker_cluster_a',
>               'status': 'IN_PROGRESS',
>               'stderr': '',
>               'stdout': "2016-11-30 17:50:32,455 -
> Directory['/data/appdata/enable_presto_worker/data/var/run'] {'recursive':
> True}",
>               'structuredOut': '{}',
>               'taskId': 4}]}
>
> On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com> wrote:
>
> > Also keep in mind - if your application needs to run something useful
> when
> > the stop cmd is initiated then you need to set an appropriate value to
> > site.global.app_container.release_timeout_secs. Otherwise kill signals
> are
> > sent to the agent containers via YARN (almost immediately) and the
> > containers don¹t get time for graceful shutdown.
> >
> > -Gour
> >
> >
> >
> > On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:
> >
> > >It looks like the Traceback stack for the stop command output is
> truncated
> > >in the logs you pasted. I only see the first line of the Traceback:
> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command output:
> > > err: Traceback (most recent call last):
> > >  File
> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> > >64974/filecache/11/enable_presto_worker.zip/package/
> > >scripts/enable_presto_worker_component.py",
> > >line 23, in <module>
> > >    from resource_management import *
> > >
> > >So I cannot see the PYTHONPATH error you're talking about. If you paste
> > >the
> > >entire Traceback that might tell us more.
> > >
> > >Billie
> > >
> > >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com>
> > wrote:
> > >
> > >> it does implement a STOP command that does something useful.
> > >> it fails because the PYTHONPATH isn't set like it is in different
> > >>commands.
> > >>
> > >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com>
> > >>wrote:
> > >>
> > >> > Does enable_presto_worker_component.py support/implement a STOP
> > >>command?
> > >> >
> > >> > Does your application need to run something useful when the stop cmd
> > >>is
> > >> > initiated?
> > >> >
> > >> > -Gour
> > >> >
> > >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com> wrote:
> > >> >
> > >> > >Hi,
> > >> > >
> > >> > >I hope I'm writing to the correct mailing list. please direct me
> > >> elsewhere
> > >> > >if this is not the correct place to write to.
> > >> > >
> > >> > >I've written a simple custom slider application and the STOP script
> > >> fails
> > >> > >due to what seems like a slider issue of not setting the PYTHONPATH
> > >>when
> > >> > >running the stop command.
> > >> > >
> > >> > >I will probably debug to see what goes on in
> > >>CustomServiceOrchestrator
> > >> and
> > >> > >why it doesn't set the env variables there but I'll only do it in a
> > >> couple
> > >> > >of weeks.
> > >> > >I wanted to ask if anyone noticed something like this before I look
> > >>into
> > >> > >it
> > >> > >further.
> > >> > >
> > >> > >in the agent log it looks like this:
> > >> > >
> > >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
> > >> > >{u'roleCommand': u'STOP', u'clusterName':
> > >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
> > >> > u'hostname':
> > >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
> > >> > >u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
> > >> > >u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
> > >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
> > >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role':
> u'NODE',
> > >> > >u'commandParams': {u'record_config': u'true',
> > >>u'service_package_folder':
> > >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> > >> > >u'scripts/enable_presto_worker_component.py', u'schema_version':
> > >> u'2.0',
> > >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'}, u'taskId':
> 5,
> > >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
> > >> > >u'configurations': {u'global': {u'security_enabled': u'false',
> > >> > >u'app_container_id': u'container_e468_
> 1479830316320_64974_01_000091'
> > ,
> > >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
> > u'app_name':
> > >> > >u'enable_presto_worker.py', u'app_root':
> > >> > >u'${AGENT_WORK_ROOT}/app/install',
> > >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
> > >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
> > >>u'pid_file':
> > >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
> > >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
> > >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
> > >>u'9990'}}}
> > >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 -
> > >>Storing
> > >> > >applied config: {u'global': {u'app_container_id':
> > >> > >u'container_e468_1479830316320_64974_01_000091',
> > >> > >             u'app_container_tag': u'2',
> > >> > >             u'app_input_conf_dir':
> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_6
> > >> > >4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
> > >> > >             u'app_install_dir':
> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_6
> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> > >> > >             u'app_log_dir':
> > >> > >u'/data/log/hadoop-yarn/container/application_
> > >> > 1479830316320_64974/containe
> > >> > >r_e468_1479830316320_64974_01_000091',
> > >> > >             u'app_name': u'enable_presto_worker.py',
> > >> > >             u'app_pid_dir':
> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_6
> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
> > >> > >             u'app_root':
> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_6
> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> > >> > >             u'data_dir': u'/data/appdata/enable_presto_
> > worker/data',
> > >> > >             u'pid_file':
> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_6
> > >> > >4974/container_e468_1479830316320_64974_01_000091/
> > >> app/run/component.pid',
> > >> > >             u'security_enabled': u'false',
> > >> > >             u'state_monitor_port': u'9990'}}
> > >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
> > >> > > /usr/bin/python -S
> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_649
> > >> > >74/filecache/11/enable_presto_worker.zip/package/
> > >> > scripts/enable_presto_wor
> > >> > >ker_component.py
> > >> > >STOP
> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> > >> > 1479830316320_6497
> > >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_649
> > >> > >74/filecache/11/enable_presto_worker.zip/package
> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> > >> > 1479830316320_6497
> > >> > >4/container_e468_1479830316320_64974_01_000091/
> structured-out-5.json
> > >> > >INFO
> > >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_649
> > >> > >74/container_e468_1479830316320_64974_01_000091
> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
> > >>output:
> > >> > > err: Traceback (most recent call last):
> > >> > >  File
> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > application_1479830316320_64
> > >> > >974/filecache/11/enable_presto_worker.zip/package/
> > >> > scripts/enable_presto_wo
> > >> > >rker_component.py",
> > >> > >line 23, in <module>
> > >> > >    from resource_management import *
> > >> >
> > >> >
> > >>
> >
> >
>

Re: failure in the STOP command

Posted by Ophir Etzion <op...@foursquare.com>.
I'm doing import params but it doesn't even reach there because it fails
on `from resource_management import *` at the top of the file because the
pythonpath isnt set.

On Fri, Dec 2, 2016 at 1:09 PM, Gour Saha <gs...@hortonworks.com> wrote:

> Can you check your package script stop function if it is doing "import
> params" like this -
> https://github.com/apache/incubator-slider/blob/develop/
> app-packages/hbase/
> package/scripts/hbase_master.py#L48
>
>
> If yes, then you might have to share your app-package scripts (without the
> app binary/tar), for us to debug further. For that you have to file a bug
> and upload it to the bug. Attaching it to the email to this DL will not
> work.
>
> -Gour
>
> On 12/2/16, 9:30 AM, "Ophir Etzion" <op...@foursquare.com> wrote:
>
> >1. you can't see the PYTHONPATH issue. you can see there is no setting of
> >the PYTHONPATH that you can see in the START command.
> >2. thanks for letting me know about release_timeout_secs but for my app I
> >don't care if the containers die, the stop command sends an udp packet
> >elsewhere.
> >
> >here is the output for START where you can see the PYTHONPATH being set:
> >INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
> >['/usr/bin/python',
> > '-S',
> >
> >u'/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_6
> >4974/filecache/11/enable_presto_worker.zip/package/
> scripts/enable_presto_w
> >orker_component.py',
> > u'START',
> >
> >'/export/hda3/data/log/hadoop-yarn/container/
> application_1479830316320_649
> >74/container_e468_1479830316320_64974_01_000091/command-4.json',
> >
> >'/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_64
> >974/filecache/11/enable_presto_worker.zip/package',
> >
> >'/export/hda3/data/log/hadoop-yarn/container/
> application_1479830316320_649
> >74/container_e468_1479830316320_64974_01_000091/structured-out-4.json',
> > 'INFO',
> >
> >'/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_64
> >974/container_e468_1479830316320_64974_01_000091']
> >INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
> >PYTHONPATH to
> >/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_649
> >74/filecache/10/slider-agent.tar.gz/slider-agent/jinja2:/
> export/hdj3/yarn/
> >nm/usercache/hive/appcache/application_1479830316320_
> 64974/filecache/10/sl
> >ider-agent.tar.gz/slider-agent
> >INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
> >{'componentStatus': [],
> > 'reports': [{'actionId': u'4-1',
> >              'clusterName': u'enable-presto-worker_cluster_a',
> >              'exitcode': 777,
> >              'reportResult': True,
> >              'role': u'NODE',
> >              'roleCommand': u'START',
> >              'serviceName': u'enable-presto-worker_cluster_a',
> >              'status': 'IN_PROGRESS',
> >              'stderr': '',
> >              'stdout': "2016-11-30 17:50:32,455 -
> >Directory['/data/appdata/enable_presto_worker/data/var/run']
> {'recursive':
> >True}",
> >              'structuredOut': '{}',
> >              'taskId': 4}]}
> >
> >On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com> wrote:
> >
> >> Also keep in mind - if your application needs to run something useful
> >>when
> >> the stop cmd is initiated then you need to set an appropriate value to
> >> site.global.app_container.release_timeout_secs. Otherwise kill signals
> >>are
> >> sent to the agent containers via YARN (almost immediately) and the
> >> containers don¹t get time for graceful shutdown.
> >>
> >> -Gour
> >>
> >>
> >>
> >> On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:
> >>
> >> >It looks like the Traceback stack for the stop command output is
> >>truncated
> >> >in the logs you pasted. I only see the first line of the Traceback:
> >> >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
> >>output:
> >> > err: Traceback (most recent call last):
> >> >  File
> >>
> >>>"/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> >> >64974/filecache/11/enable_presto_worker.zip/package/
> >> >scripts/enable_presto_worker_component.py",
> >> >line 23, in <module>
> >> >    from resource_management import *
> >> >
> >> >So I cannot see the PYTHONPATH error you're talking about. If you paste
> >> >the
> >> >entire Traceback that might tell us more.
> >> >
> >> >Billie
> >> >
> >> >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com>
> >> wrote:
> >> >
> >> >> it does implement a STOP command that does something useful.
> >> >> it fails because the PYTHONPATH isn't set like it is in different
> >> >>commands.
> >> >>
> >> >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com>
> >> >>wrote:
> >> >>
> >> >> > Does enable_presto_worker_component.py support/implement a STOP
> >> >>command?
> >> >> >
> >> >> > Does your application need to run something useful when the stop
> >>cmd
> >> >>is
> >> >> > initiated?
> >> >> >
> >> >> > -Gour
> >> >> >
> >> >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com>
> wrote:
> >> >> >
> >> >> > >Hi,
> >> >> > >
> >> >> > >I hope I'm writing to the correct mailing list. please direct me
> >> >> elsewhere
> >> >> > >if this is not the correct place to write to.
> >> >> > >
> >> >> > >I've written a simple custom slider application and the STOP
> >>script
> >> >> fails
> >> >> > >due to what seems like a slider issue of not setting the
> >>PYTHONPATH
> >> >>when
> >> >> > >running the stop command.
> >> >> > >
> >> >> > >I will probably debug to see what goes on in
> >> >>CustomServiceOrchestrator
> >> >> and
> >> >> > >why it doesn't set the env variables there but I'll only do it in
> >>a
> >> >> couple
> >> >> > >of weeks.
> >> >> > >I wanted to ask if anyone noticed something like this before I
> >>look
> >> >>into
> >> >> > >it
> >> >> > >further.
> >> >> > >
> >> >> > >in the agent log it looks like this:
> >> >> > >
> >> >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
> >> >> > >{u'roleCommand': u'STOP', u'clusterName':
> >> >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
> >> >> > u'hostname':
> >> >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
> >> >> > >u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
> >> >> > >u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
> >> >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
> >> >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role':
> >>u'NODE',
> >> >> > >u'commandParams': {u'record_config': u'true',
> >> >>u'service_package_folder':
> >> >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> >> >> > >u'scripts/enable_presto_worker_component.py', u'schema_version':
> >> >> u'2.0',
> >> >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'},
> >>u'taskId': 5,
> >> >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
> >> >> > >u'configurations': {u'global': {u'security_enabled': u'false',
> >> >> > >u'app_container_id':
> >>u'container_e468_1479830316320_64974_01_000091'
> >> ,
> >> >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
> >> u'app_name':
> >> >> > >u'enable_presto_worker.py', u'app_root':
> >> >> > >u'${AGENT_WORK_ROOT}/app/install',
> >> >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
> >> >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
> >> >>u'pid_file':
> >> >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
> >> >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
> >> >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
> >> >>u'9990'}}}
> >> >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 -
> >> >>Storing
> >> >> > >applied config: {u'global': {u'app_container_id':
> >> >> > >u'container_e468_1479830316320_64974_01_000091',
> >> >> > >             u'app_container_tag': u'2',
> >> >> > >             u'app_input_conf_dir':
> >> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_6
> >> >> > >4974/container_e468_1479830316320_64974_01_000091/
> propagatedconf',
> >> >> > >             u'app_install_dir':
> >> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_6
> >> >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> >> >> > >             u'app_log_dir':
> >> >> > >u'/data/log/hadoop-yarn/container/application_
> >> >> > 1479830316320_64974/containe
> >> >> > >r_e468_1479830316320_64974_01_000091',
> >> >> > >             u'app_name': u'enable_presto_worker.py',
> >> >> > >             u'app_pid_dir':
> >> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_6
> >> >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
> >> >> > >             u'app_root':
> >> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_6
> >> >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> >> >> > >             u'data_dir': u'/data/appdata/enable_presto_
> >> worker/data',
> >> >> > >             u'pid_file':
> >> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_6
> >> >> > >4974/container_e468_1479830316320_64974_01_000091/
> >> >> app/run/component.pid',
> >> >> > >             u'security_enabled': u'false',
> >> >> > >             u'state_monitor_port': u'9990'}}
> >> >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
> >> >> > > /usr/bin/python -S
> >> >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_649
> >> >> > >74/filecache/11/enable_presto_worker.zip/package/
> >> >> > scripts/enable_presto_wor
> >> >> > >ker_component.py
> >> >> > >STOP
> >> >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> >> >> > 1479830316320_6497
> >> >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
> >> >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_649
> >> >> > >74/filecache/11/enable_presto_worker.zip/package
> >> >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> >> >> > 1479830316320_6497
> >> >> >
> >>>4/container_e468_1479830316320_64974_01_000091/structured-out-5.json
> >> >> > >INFO
> >> >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_649
> >> >> > >74/container_e468_1479830316320_64974_01_000091
> >> >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
> >> >>output:
> >> >> > > err: Traceback (most recent call last):
> >> >> > >  File
> >> >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> >> > application_1479830316320_64
> >> >> > >974/filecache/11/enable_presto_worker.zip/package/
> >> >> > scripts/enable_presto_wo
> >> >> > >rker_component.py",
> >> >> > >line 23, in <module>
> >> >> > >    from resource_management import *
> >> >> >
> >> >> >
> >> >>
> >>
> >>
>
>

Re: failure in the STOP command

Posted by Gour Saha <gs...@hortonworks.com>.
Can you check your package script stop function if it is doing "import
params" like this -
https://github.com/apache/incubator-slider/blob/develop/app-packages/hbase/
package/scripts/hbase_master.py#L48


If yes, then you might have to share your app-package scripts (without the
app binary/tar), for us to debug further. For that you have to file a bug
and upload it to the bug. Attaching it to the email to this DL will not
work.

-Gour

On 12/2/16, 9:30 AM, "Ophir Etzion" <op...@foursquare.com> wrote:

>1. you can't see the PYTHONPATH issue. you can see there is no setting of
>the PYTHONPATH that you can see in the START command.
>2. thanks for letting me know about release_timeout_secs but for my app I
>don't care if the containers die, the stop command sends an udp packet
>elsewhere.
>
>here is the output for START where you can see the PYTHONPATH being set:
>INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
>['/usr/bin/python',
> '-S',
> 
>u'/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_6
>4974/filecache/11/enable_presto_worker.zip/package/scripts/enable_presto_w
>orker_component.py',
> u'START',
> 
>'/export/hda3/data/log/hadoop-yarn/container/application_1479830316320_649
>74/container_e468_1479830316320_64974_01_000091/command-4.json',
> 
>'/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_64
>974/filecache/11/enable_presto_worker.zip/package',
> 
>'/export/hda3/data/log/hadoop-yarn/container/application_1479830316320_649
>74/container_e468_1479830316320_64974_01_000091/structured-out-4.json',
> 'INFO',
> 
>'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64
>974/container_e468_1479830316320_64974_01_000091']
>INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
>PYTHONPATH to
>/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_649
>74/filecache/10/slider-agent.tar.gz/slider-agent/jinja2:/export/hdj3/yarn/
>nm/usercache/hive/appcache/application_1479830316320_64974/filecache/10/sl
>ider-agent.tar.gz/slider-agent
>INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
>{'componentStatus': [],
> 'reports': [{'actionId': u'4-1',
>              'clusterName': u'enable-presto-worker_cluster_a',
>              'exitcode': 777,
>              'reportResult': True,
>              'role': u'NODE',
>              'roleCommand': u'START',
>              'serviceName': u'enable-presto-worker_cluster_a',
>              'status': 'IN_PROGRESS',
>              'stderr': '',
>              'stdout': "2016-11-30 17:50:32,455 -
>Directory['/data/appdata/enable_presto_worker/data/var/run'] {'recursive':
>True}",
>              'structuredOut': '{}',
>              'taskId': 4}]}
>
>On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com> wrote:
>
>> Also keep in mind - if your application needs to run something useful
>>when
>> the stop cmd is initiated then you need to set an appropriate value to
>> site.global.app_container.release_timeout_secs. Otherwise kill signals
>>are
>> sent to the agent containers via YARN (almost immediately) and the
>> containers don¹t get time for graceful shutdown.
>>
>> -Gour
>>
>>
>>
>> On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:
>>
>> >It looks like the Traceback stack for the stop command output is
>>truncated
>> >in the logs you pasted. I only see the first line of the Traceback:
>> >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
>>output:
>> > err: Traceback (most recent call last):
>> >  File
>> 
>>>"/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
>> >64974/filecache/11/enable_presto_worker.zip/package/
>> >scripts/enable_presto_worker_component.py",
>> >line 23, in <module>
>> >    from resource_management import *
>> >
>> >So I cannot see the PYTHONPATH error you're talking about. If you paste
>> >the
>> >entire Traceback that might tell us more.
>> >
>> >Billie
>> >
>> >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com>
>> wrote:
>> >
>> >> it does implement a STOP command that does something useful.
>> >> it fails because the PYTHONPATH isn't set like it is in different
>> >>commands.
>> >>
>> >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com>
>> >>wrote:
>> >>
>> >> > Does enable_presto_worker_component.py support/implement a STOP
>> >>command?
>> >> >
>> >> > Does your application need to run something useful when the stop
>>cmd
>> >>is
>> >> > initiated?
>> >> >
>> >> > -Gour
>> >> >
>> >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com> wrote:
>> >> >
>> >> > >Hi,
>> >> > >
>> >> > >I hope I'm writing to the correct mailing list. please direct me
>> >> elsewhere
>> >> > >if this is not the correct place to write to.
>> >> > >
>> >> > >I've written a simple custom slider application and the STOP
>>script
>> >> fails
>> >> > >due to what seems like a slider issue of not setting the
>>PYTHONPATH
>> >>when
>> >> > >running the stop command.
>> >> > >
>> >> > >I will probably debug to see what goes on in
>> >>CustomServiceOrchestrator
>> >> and
>> >> > >why it doesn't set the env variables there but I'll only do it in
>>a
>> >> couple
>> >> > >of weeks.
>> >> > >I wanted to ask if anyone noticed something like this before I
>>look
>> >>into
>> >> > >it
>> >> > >further.
>> >> > >
>> >> > >in the agent log it looks like this:
>> >> > >
>> >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
>> >> > >{u'roleCommand': u'STOP', u'clusterName':
>> >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
>> >> > u'hostname':
>> >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
>> >> > >u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
>> >> > >u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
>> >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
>> >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role':
>>u'NODE',
>> >> > >u'commandParams': {u'record_config': u'true',
>> >>u'service_package_folder':
>> >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
>> >> > >u'scripts/enable_presto_worker_component.py', u'schema_version':
>> >> u'2.0',
>> >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'},
>>u'taskId': 5,
>> >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
>> >> > >u'configurations': {u'global': {u'security_enabled': u'false',
>> >> > >u'app_container_id':
>>u'container_e468_1479830316320_64974_01_000091'
>> ,
>> >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
>> u'app_name':
>> >> > >u'enable_presto_worker.py', u'app_root':
>> >> > >u'${AGENT_WORK_ROOT}/app/install',
>> >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
>> >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
>> >>u'pid_file':
>> >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
>> >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
>> >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
>> >>u'9990'}}}
>> >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 -
>> >>Storing
>> >> > >applied config: {u'global': {u'app_container_id':
>> >> > >u'container_e468_1479830316320_64974_01_000091',
>> >> > >             u'app_container_tag': u'2',
>> >> > >             u'app_input_conf_dir':
>> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_6
>> >> > >4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
>> >> > >             u'app_install_dir':
>> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_6
>> >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
>> >> > >             u'app_log_dir':
>> >> > >u'/data/log/hadoop-yarn/container/application_
>> >> > 1479830316320_64974/containe
>> >> > >r_e468_1479830316320_64974_01_000091',
>> >> > >             u'app_name': u'enable_presto_worker.py',
>> >> > >             u'app_pid_dir':
>> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_6
>> >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
>> >> > >             u'app_root':
>> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_6
>> >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
>> >> > >             u'data_dir': u'/data/appdata/enable_presto_
>> worker/data',
>> >> > >             u'pid_file':
>> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_6
>> >> > >4974/container_e468_1479830316320_64974_01_000091/
>> >> app/run/component.pid',
>> >> > >             u'security_enabled': u'false',
>> >> > >             u'state_monitor_port': u'9990'}}
>> >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
>> >> > > /usr/bin/python -S
>> >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_649
>> >> > >74/filecache/11/enable_presto_worker.zip/package/
>> >> > scripts/enable_presto_wor
>> >> > >ker_component.py
>> >> > >STOP
>> >> > >/export/hda3/data/log/hadoop-yarn/container/application_
>> >> > 1479830316320_6497
>> >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
>> >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_649
>> >> > >74/filecache/11/enable_presto_worker.zip/package
>> >> > >/export/hda3/data/log/hadoop-yarn/container/application_
>> >> > 1479830316320_6497
>> >> > 
>>>4/container_e468_1479830316320_64974_01_000091/structured-out-5.json
>> >> > >INFO
>> >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_649
>> >> > >74/container_e468_1479830316320_64974_01_000091
>> >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
>> >>output:
>> >> > > err: Traceback (most recent call last):
>> >> > >  File
>> >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
>> >> > application_1479830316320_64
>> >> > >974/filecache/11/enable_presto_worker.zip/package/
>> >> > scripts/enable_presto_wo
>> >> > >rker_component.py",
>> >> > >line 23, in <module>
>> >> > >    from resource_management import *
>> >> >
>> >> >
>> >>
>>
>>


Re: failure in the STOP command

Posted by Ophir Etzion <op...@foursquare.com>.
1. you can't see the PYTHONPATH issue. you can see there is no setting of
the PYTHONPATH that you can see in the START command.
2. thanks for letting me know about release_timeout_secs but for my app I
don't care if the containers die, the stop command sends an udp packet
elsewhere.

here is the output for START where you can see the PYTHONPATH being set:
INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
['/usr/bin/python',
 '-S',
 u'/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/filecache/11/enable_presto_worker.zip/package/scripts/enable_presto_worker_component.py',
 u'START',
 '/export/hda3/data/log/hadoop-yarn/container/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/command-4.json',
 '/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/filecache/11/enable_presto_worker.zip/package',
 '/export/hda3/data/log/hadoop-yarn/container/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091/structured-out-4.json',
 'INFO',
 '/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/container_e468_1479830316320_64974_01_000091']
INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
PYTHONPATH to
/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/filecache/10/slider-agent.tar.gz/slider-agent/jinja2:/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_64974/filecache/10/slider-agent.tar.gz/slider-agent
INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
{'componentStatus': [],
 'reports': [{'actionId': u'4-1',
              'clusterName': u'enable-presto-worker_cluster_a',
              'exitcode': 777,
              'reportResult': True,
              'role': u'NODE',
              'roleCommand': u'START',
              'serviceName': u'enable-presto-worker_cluster_a',
              'status': 'IN_PROGRESS',
              'stderr': '',
              'stdout': "2016-11-30 17:50:32,455 -
Directory['/data/appdata/enable_presto_worker/data/var/run'] {'recursive':
True}",
              'structuredOut': '{}',
              'taskId': 4}]}

On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com> wrote:

> Also keep in mind - if your application needs to run something useful when
> the stop cmd is initiated then you need to set an appropriate value to
> site.global.app_container.release_timeout_secs. Otherwise kill signals are
> sent to the agent containers via YARN (almost immediately) and the
> containers don¹t get time for graceful shutdown.
>
> -Gour
>
>
>
> On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:
>
> >It looks like the Traceback stack for the stop command output is truncated
> >in the logs you pasted. I only see the first line of the Traceback:
> >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command output:
> > err: Traceback (most recent call last):
> >  File
> >"/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
> >64974/filecache/11/enable_presto_worker.zip/package/
> >scripts/enable_presto_worker_component.py",
> >line 23, in <module>
> >    from resource_management import *
> >
> >So I cannot see the PYTHONPATH error you're talking about. If you paste
> >the
> >entire Traceback that might tell us more.
> >
> >Billie
> >
> >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com>
> wrote:
> >
> >> it does implement a STOP command that does something useful.
> >> it fails because the PYTHONPATH isn't set like it is in different
> >>commands.
> >>
> >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com>
> >>wrote:
> >>
> >> > Does enable_presto_worker_component.py support/implement a STOP
> >>command?
> >> >
> >> > Does your application need to run something useful when the stop cmd
> >>is
> >> > initiated?
> >> >
> >> > -Gour
> >> >
> >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com> wrote:
> >> >
> >> > >Hi,
> >> > >
> >> > >I hope I'm writing to the correct mailing list. please direct me
> >> elsewhere
> >> > >if this is not the correct place to write to.
> >> > >
> >> > >I've written a simple custom slider application and the STOP script
> >> fails
> >> > >due to what seems like a slider issue of not setting the PYTHONPATH
> >>when
> >> > >running the stop command.
> >> > >
> >> > >I will probably debug to see what goes on in
> >>CustomServiceOrchestrator
> >> and
> >> > >why it doesn't set the env variables there but I'll only do it in a
> >> couple
> >> > >of weeks.
> >> > >I wanted to ask if anyone noticed something like this before I look
> >>into
> >> > >it
> >> > >further.
> >> > >
> >> > >in the agent log it looks like this:
> >> > >
> >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
> >> > >{u'roleCommand': u'STOP', u'clusterName':
> >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
> >> > u'hostname':
> >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
> >> > >u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
> >> > >u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
> >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
> >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role': u'NODE',
> >> > >u'commandParams': {u'record_config': u'true',
> >>u'service_package_folder':
> >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> >> > >u'scripts/enable_presto_worker_component.py', u'schema_version':
> >> u'2.0',
> >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'}, u'taskId': 5,
> >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
> >> > >u'configurations': {u'global': {u'security_enabled': u'false',
> >> > >u'app_container_id': u'container_e468_1479830316320_64974_01_000091'
> ,
> >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
> u'app_name':
> >> > >u'enable_presto_worker.py', u'app_root':
> >> > >u'${AGENT_WORK_ROOT}/app/install',
> >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
> >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
> >>u'pid_file':
> >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
> >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
> >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
> >>u'9990'}}}
> >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 -
> >>Storing
> >> > >applied config: {u'global': {u'app_container_id':
> >> > >u'container_e468_1479830316320_64974_01_000091',
> >> > >             u'app_container_tag': u'2',
> >> > >             u'app_input_conf_dir':
> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_6
> >> > >4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
> >> > >             u'app_install_dir':
> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_6
> >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> >> > >             u'app_log_dir':
> >> > >u'/data/log/hadoop-yarn/container/application_
> >> > 1479830316320_64974/containe
> >> > >r_e468_1479830316320_64974_01_000091',
> >> > >             u'app_name': u'enable_presto_worker.py',
> >> > >             u'app_pid_dir':
> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_6
> >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
> >> > >             u'app_root':
> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_6
> >> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> >> > >             u'data_dir': u'/data/appdata/enable_presto_
> worker/data',
> >> > >             u'pid_file':
> >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_6
> >> > >4974/container_e468_1479830316320_64974_01_000091/
> >> app/run/component.pid',
> >> > >             u'security_enabled': u'false',
> >> > >             u'state_monitor_port': u'9990'}}
> >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
> >> > > /usr/bin/python -S
> >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_649
> >> > >74/filecache/11/enable_presto_worker.zip/package/
> >> > scripts/enable_presto_wor
> >> > >ker_component.py
> >> > >STOP
> >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> >> > 1479830316320_6497
> >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
> >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_649
> >> > >74/filecache/11/enable_presto_worker.zip/package
> >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> >> > 1479830316320_6497
> >> > >4/container_e468_1479830316320_64974_01_000091/structured-out-5.json
> >> > >INFO
> >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_649
> >> > >74/container_e468_1479830316320_64974_01_000091
> >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
> >>output:
> >> > > err: Traceback (most recent call last):
> >> > >  File
> >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> >> > application_1479830316320_64
> >> > >974/filecache/11/enable_presto_worker.zip/package/
> >> > scripts/enable_presto_wo
> >> > >rker_component.py",
> >> > >line 23, in <module>
> >> > >    from resource_management import *
> >> >
> >> >
> >>
>
>

Re: failure in the STOP command

Posted by Gour Saha <gs...@hortonworks.com>.
Also keep in mind - if your application needs to run something useful when
the stop cmd is initiated then you need to set an appropriate value to
site.global.app_container.release_timeout_secs. Otherwise kill signals are
sent to the agent containers via YARN (almost immediately) and the
containers don¹t get time for graceful shutdown.

-Gour



On 12/2/16, 8:29 AM, "Billie Rinaldi" <bi...@gmail.com> wrote:

>It looks like the Traceback stack for the stop command output is truncated
>in the logs you pasted. I only see the first line of the Traceback:
>INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command output:
> err: Traceback (most recent call last):
>  File
>"/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
>64974/filecache/11/enable_presto_worker.zip/package/
>scripts/enable_presto_worker_component.py",
>line 23, in <module>
>    from resource_management import *
>
>So I cannot see the PYTHONPATH error you're talking about. If you paste
>the
>entire Traceback that might tell us more.
>
>Billie
>
>On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com> wrote:
>
>> it does implement a STOP command that does something useful.
>> it fails because the PYTHONPATH isn't set like it is in different
>>commands.
>>
>> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com>
>>wrote:
>>
>> > Does enable_presto_worker_component.py support/implement a STOP
>>command?
>> >
>> > Does your application need to run something useful when the stop cmd
>>is
>> > initiated?
>> >
>> > -Gour
>> >
>> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com> wrote:
>> >
>> > >Hi,
>> > >
>> > >I hope I'm writing to the correct mailing list. please direct me
>> elsewhere
>> > >if this is not the correct place to write to.
>> > >
>> > >I've written a simple custom slider application and the STOP script
>> fails
>> > >due to what seems like a slider issue of not setting the PYTHONPATH
>>when
>> > >running the stop command.
>> > >
>> > >I will probably debug to see what goes on in
>>CustomServiceOrchestrator
>> and
>> > >why it doesn't set the env variables there but I'll only do it in a
>> couple
>> > >of weeks.
>> > >I wanted to ask if anyone noticed something like this before I look
>>into
>> > >it
>> > >further.
>> > >
>> > >in the agent log it looks like this:
>> > >
>> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
>> > >{u'roleCommand': u'STOP', u'clusterName':
>> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
>> > u'hostname':
>> > >u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
>> > >u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
>> > >u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
>> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
>> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role': u'NODE',
>> > >u'commandParams': {u'record_config': u'true',
>>u'service_package_folder':
>> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
>> > >u'scripts/enable_presto_worker_component.py', u'schema_version':
>> u'2.0',
>> > >u'command_timeout': u'600', u'script_type': u'PYTHON'}, u'taskId': 5,
>> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
>> > >u'configurations': {u'global': {u'security_enabled': u'false',
>> > >u'app_container_id': u'container_e468_1479830316320_64974_01_000091',
>> > >u'data_dir': u'/data/appdata/enable_presto_worker/data', u'app_name':
>> > >u'enable_presto_worker.py', u'app_root':
>> > >u'${AGENT_WORK_ROOT}/app/install',
>> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
>> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
>>u'pid_file':
>> > >u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
>> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
>> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
>>u'9990'}}}
>> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 -
>>Storing
>> > >applied config: {u'global': {u'app_container_id':
>> > >u'container_e468_1479830316320_64974_01_000091',
>> > >             u'app_container_tag': u'2',
>> > >             u'app_input_conf_dir':
>> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_6
>> > >4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
>> > >             u'app_install_dir':
>> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_6
>> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
>> > >             u'app_log_dir':
>> > >u'/data/log/hadoop-yarn/container/application_
>> > 1479830316320_64974/containe
>> > >r_e468_1479830316320_64974_01_000091',
>> > >             u'app_name': u'enable_presto_worker.py',
>> > >             u'app_pid_dir':
>> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_6
>> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
>> > >             u'app_root':
>> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_6
>> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
>> > >             u'data_dir': u'/data/appdata/enable_presto_worker/data',
>> > >             u'pid_file':
>> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_6
>> > >4974/container_e468_1479830316320_64974_01_000091/
>> app/run/component.pid',
>> > >             u'security_enabled': u'false',
>> > >             u'state_monitor_port': u'9990'}}
>> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
>> > > /usr/bin/python -S
>> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_649
>> > >74/filecache/11/enable_presto_worker.zip/package/
>> > scripts/enable_presto_wor
>> > >ker_component.py
>> > >STOP
>> > >/export/hda3/data/log/hadoop-yarn/container/application_
>> > 1479830316320_6497
>> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
>> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_649
>> > >74/filecache/11/enable_presto_worker.zip/package
>> > >/export/hda3/data/log/hadoop-yarn/container/application_
>> > 1479830316320_6497
>> > >4/container_e468_1479830316320_64974_01_000091/structured-out-5.json
>> > >INFO
>> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_649
>> > >74/container_e468_1479830316320_64974_01_000091
>> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
>>output:
>> > > err: Traceback (most recent call last):
>> > >  File
>> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
>> > application_1479830316320_64
>> > >974/filecache/11/enable_presto_worker.zip/package/
>> > scripts/enable_presto_wo
>> > >rker_component.py",
>> > >line 23, in <module>
>> > >    from resource_management import *
>> >
>> >
>>


Re: failure in the STOP command

Posted by Billie Rinaldi <bi...@gmail.com>.
It looks like the Traceback stack for the stop command output is truncated
in the logs you pasted. I only see the first line of the Traceback:
INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command output:
 err: Traceback (most recent call last):
  File
"/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_
64974/filecache/11/enable_presto_worker.zip/package/
scripts/enable_presto_worker_component.py",
line 23, in <module>
    from resource_management import *

So I cannot see the PYTHONPATH error you're talking about. If you paste the
entire Traceback that might tell us more.

Billie

On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com> wrote:

> it does implement a STOP command that does something useful.
> it fails because the PYTHONPATH isn't set like it is in different commands.
>
> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com> wrote:
>
> > Does enable_presto_worker_component.py support/implement a STOP command?
> >
> > Does your application need to run something useful when the stop cmd is
> > initiated?
> >
> > -Gour
> >
> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com> wrote:
> >
> > >Hi,
> > >
> > >I hope I'm writing to the correct mailing list. please direct me
> elsewhere
> > >if this is not the correct place to write to.
> > >
> > >I've written a simple custom slider application and the STOP script
> fails
> > >due to what seems like a slider issue of not setting the PYTHONPATH when
> > >running the stop command.
> > >
> > >I will probably debug to see what goes on in CustomServiceOrchestrator
> and
> > >why it doesn't set the env variables there but I'll only do it in a
> couple
> > >of weeks.
> > >I wanted to ask if anyone noticed something like this before I look into
> > >it
> > >further.
> > >
> > >in the agent log it looks like this:
> > >
> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
> > >{u'roleCommand': u'STOP', u'clusterName':
> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
> > u'hostname':
> > >u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
> > >u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
> > >u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role': u'NODE',
> > >u'commandParams': {u'record_config': u'true', u'service_package_folder':
> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> > >u'scripts/enable_presto_worker_component.py', u'schema_version':
> u'2.0',
> > >u'command_timeout': u'600', u'script_type': u'PYTHON'}, u'taskId': 5,
> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
> > >u'configurations': {u'global': {u'security_enabled': u'false',
> > >u'app_container_id': u'container_e468_1479830316320_64974_01_000091',
> > >u'data_dir': u'/data/appdata/enable_presto_worker/data', u'app_name':
> > >u'enable_presto_worker.py', u'app_root':
> > >u'${AGENT_WORK_ROOT}/app/install',
> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2', u'pid_file':
> > >u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port': u'9990'}}}
> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 - Storing
> > >applied config: {u'global': {u'app_container_id':
> > >u'container_e468_1479830316320_64974_01_000091',
> > >             u'app_container_tag': u'2',
> > >             u'app_input_conf_dir':
> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_6
> > >4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
> > >             u'app_install_dir':
> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_6
> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> > >             u'app_log_dir':
> > >u'/data/log/hadoop-yarn/container/application_
> > 1479830316320_64974/containe
> > >r_e468_1479830316320_64974_01_000091',
> > >             u'app_name': u'enable_presto_worker.py',
> > >             u'app_pid_dir':
> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_6
> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
> > >             u'app_root':
> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_6
> > >4974/container_e468_1479830316320_64974_01_000091/app/install',
> > >             u'data_dir': u'/data/appdata/enable_presto_worker/data',
> > >             u'pid_file':
> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_6
> > >4974/container_e468_1479830316320_64974_01_000091/
> app/run/component.pid',
> > >             u'security_enabled': u'false',
> > >             u'state_monitor_port': u'9990'}}
> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
> > > /usr/bin/python -S
> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_649
> > >74/filecache/11/enable_presto_worker.zip/package/
> > scripts/enable_presto_wor
> > >ker_component.py
> > >STOP
> > >/export/hda3/data/log/hadoop-yarn/container/application_
> > 1479830316320_6497
> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_649
> > >74/filecache/11/enable_presto_worker.zip/package
> > >/export/hda3/data/log/hadoop-yarn/container/application_
> > 1479830316320_6497
> > >4/container_e468_1479830316320_64974_01_000091/structured-out-5.json
> > >INFO
> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_649
> > >74/container_e468_1479830316320_64974_01_000091
> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command output:
> > > err: Traceback (most recent call last):
> > >  File
> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_64
> > >974/filecache/11/enable_presto_worker.zip/package/
> > scripts/enable_presto_wo
> > >rker_component.py",
> > >line 23, in <module>
> > >    from resource_management import *
> >
> >
>

Re: failure in the STOP command

Posted by Ophir Etzion <op...@foursquare.com>.
it does implement a STOP command that does something useful.
it fails because the PYTHONPATH isn't set like it is in different commands.

On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <gs...@hortonworks.com> wrote:

> Does enable_presto_worker_component.py support/implement a STOP command?
>
> Does your application need to run something useful when the stop cmd is
> initiated?
>
> -Gour
>
> On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com> wrote:
>
> >Hi,
> >
> >I hope I'm writing to the correct mailing list. please direct me elsewhere
> >if this is not the correct place to write to.
> >
> >I've written a simple custom slider application and the STOP script fails
> >due to what seems like a slider issue of not setting the PYTHONPATH when
> >running the stop command.
> >
> >I will probably debug to see what goes on in CustomServiceOrchestrator and
> >why it doesn't set the env variables there but I'll only do it in a couple
> >of weeks.
> >I wanted to ask if anyone noticed something like this before I look into
> >it
> >further.
> >
> >in the agent log it looks like this:
> >
> >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
> >{u'roleCommand': u'STOP', u'clusterName':
> >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
> u'hostname':
> >u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
> >u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
> >u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
> >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
> >u'serviceName': u'enable-presto-worker_cluster_a', u'role': u'NODE',
> >u'commandParams': {u'record_config': u'true', u'service_package_folder':
> >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> >u'scripts/enable_presto_worker_component.py', u'schema_version': u'2.0',
> >u'command_timeout': u'600', u'script_type': u'PYTHON'}, u'taskId': 5,
> >u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
> >u'configurations': {u'global': {u'security_enabled': u'false',
> >u'app_container_id': u'container_e468_1479830316320_64974_01_000091',
> >u'data_dir': u'/data/appdata/enable_presto_worker/data', u'app_name':
> >u'enable_presto_worker.py', u'app_root':
> >u'${AGENT_WORK_ROOT}/app/install',
> >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
> >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2', u'pid_file':
> >u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
> >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
> >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port': u'9990'}}}
> >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 - Storing
> >applied config: {u'global': {u'app_container_id':
> >u'container_e468_1479830316320_64974_01_000091',
> >             u'app_container_tag': u'2',
> >             u'app_input_conf_dir':
> >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_6
> >4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
> >             u'app_install_dir':
> >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_6
> >4974/container_e468_1479830316320_64974_01_000091/app/install',
> >             u'app_log_dir':
> >u'/data/log/hadoop-yarn/container/application_
> 1479830316320_64974/containe
> >r_e468_1479830316320_64974_01_000091',
> >             u'app_name': u'enable_presto_worker.py',
> >             u'app_pid_dir':
> >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_6
> >4974/container_e468_1479830316320_64974_01_000091/app/run',
> >             u'app_root':
> >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_6
> >4974/container_e468_1479830316320_64974_01_000091/app/install',
> >             u'data_dir': u'/data/appdata/enable_presto_worker/data',
> >             u'pid_file':
> >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_6
> >4974/container_e468_1479830316320_64974_01_000091/app/run/component.pid',
> >             u'security_enabled': u'false',
> >             u'state_monitor_port': u'9990'}}
> >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
> > /usr/bin/python -S
> >/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_649
> >74/filecache/11/enable_presto_worker.zip/package/
> scripts/enable_presto_wor
> >ker_component.py
> >STOP
> >/export/hda3/data/log/hadoop-yarn/container/application_
> 1479830316320_6497
> >4/container_e468_1479830316320_64974_01_000091/command-5.json
> >/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_649
> >74/filecache/11/enable_presto_worker.zip/package
> >/export/hda3/data/log/hadoop-yarn/container/application_
> 1479830316320_6497
> >4/container_e468_1479830316320_64974_01_000091/structured-out-5.json
> >INFO
> >/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_649
> >74/container_e468_1479830316320_64974_01_000091
> >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command output:
> > err: Traceback (most recent call last):
> >  File
> >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_64
> >974/filecache/11/enable_presto_worker.zip/package/
> scripts/enable_presto_wo
> >rker_component.py",
> >line 23, in <module>
> >    from resource_management import *
>
>

Re: failure in the STOP command

Posted by Gour Saha <gs...@hortonworks.com>.
Does enable_presto_worker_component.py support/implement a STOP command?

Does your application need to run something useful when the stop cmd is
initiated?

-Gour

On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com> wrote:

>Hi,
>
>I hope I'm writing to the correct mailing list. please direct me elsewhere
>if this is not the correct place to write to.
>
>I've written a simple custom slider application and the STOP script fails
>due to what seems like a slider issue of not setting the PYTHONPATH when
>running the stop command.
>
>I will probably debug to see what goes on in CustomServiceOrchestrator and
>why it doesn't set the env variables there but I'll only do it in a couple
>of weeks.
>I wanted to ask if anyone noticed something like this before I look into
>it
>further.
>
>in the agent log it looks like this:
>
>INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running command:
>{u'roleCommand': u'STOP', u'clusterName':
>u'enable-presto-worker_cluster_a', u'componentName': u'NODE', u'hostname':
>u'fsak20.prod.foursquare.com', u'hostLevelParams': {u'java_home':
>u'/data/loko/infrastructure-jdk8/current/bin/', u'container_id':
>u'container_e468_1479830316320_64974_01_000091'}, u'commandType':
>u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'},
>u'serviceName': u'enable-presto-worker_cluster_a', u'role': u'NODE',
>u'commandParams': {u'record_config': u'true', u'service_package_folder':
>u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
>u'scripts/enable_presto_worker_component.py', u'schema_version': u'2.0',
>u'command_timeout': u'600', u'script_type': u'PYTHON'}, u'taskId': 5,
>u'yarnDockerMode': False, u'commandId': '5-1', u'containers': [],
>u'configurations': {u'global': {u'security_enabled': u'false',
>u'app_container_id': u'container_e468_1479830316320_64974_01_000091',
>u'data_dir': u'/data/appdata/enable_presto_worker/data', u'app_name':
>u'enable_presto_worker.py', u'app_root':
>u'${AGENT_WORK_ROOT}/app/install',
>u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
>u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2', u'pid_file':
>u'${AGENT_WORK_ROOT}/app/run/component.pid', u'app_install_dir':
>u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
>u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port': u'9990'}}}
>INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 - Storing
>applied config: {u'global': {u'app_container_id':
>u'container_e468_1479830316320_64974_01_000091',
>             u'app_container_tag': u'2',
>             u'app_input_conf_dir':
>u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_6
>4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
>             u'app_install_dir':
>u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_6
>4974/container_e468_1479830316320_64974_01_000091/app/install',
>             u'app_log_dir':
>u'/data/log/hadoop-yarn/container/application_1479830316320_64974/containe
>r_e468_1479830316320_64974_01_000091',
>             u'app_name': u'enable_presto_worker.py',
>             u'app_pid_dir':
>u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_6
>4974/container_e468_1479830316320_64974_01_000091/app/run',
>             u'app_root':
>u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_6
>4974/container_e468_1479830316320_64974_01_000091/app/install',
>             u'data_dir': u'/data/appdata/enable_presto_worker/data',
>             u'pid_file':
>u'/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_6
>4974/container_e468_1479830316320_64974_01_000091/app/run/component.pid',
>             u'security_enabled': u'false',
>             u'state_monitor_port': u'9990'}}
>INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command str:
> /usr/bin/python -S
>/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_649
>74/filecache/11/enable_presto_worker.zip/package/scripts/enable_presto_wor
>ker_component.py
>STOP
>/export/hda3/data/log/hadoop-yarn/container/application_1479830316320_6497
>4/container_e468_1479830316320_64974_01_000091/command-5.json
>/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_649
>74/filecache/11/enable_presto_worker.zip/package
>/export/hda3/data/log/hadoop-yarn/container/application_1479830316320_6497
>4/container_e468_1479830316320_64974_01_000091/structured-out-5.json
>INFO
>/export/hdj3/yarn/nm/usercache/hive/appcache/application_1479830316320_649
>74/container_e468_1479830316320_64974_01_000091
>INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command output:
> err: Traceback (most recent call last):
>  File
>"/export/hdk3/yarn/nm/usercache/hive/appcache/application_1479830316320_64
>974/filecache/11/enable_presto_worker.zip/package/scripts/enable_presto_wo
>rker_component.py",
>line 23, in <module>
>    from resource_management import *