You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Fabio <an...@gmail.com> on 2014/12/09 10:51:42 UTC

Containers lifespan in session mode

Hi everyone,
I'm currently running Hive on Tez, especially I am testing the session 
mode. I can actually submit different queries to the same Tez AM, and 
that's ok. But for some reason containers are released a very short time 
after the end of the assigned task, whenever no new task is pending. In 
such a way there is no chance for container reuse among different 
queries. I already tried to set 
tez.am.container.session.delay-allocation-millis=-1 (and before this, to 
600000), but this behavior persists.
In the logs I see this two suspicious lines:

2014-12-09 09:44:23,035 INFO [DelayedContainerManager] 
rm.YarnTaskSchedulerService: Releasing unused container: 
container_1418090991482_0008_01_000002

and a few milliseconds after the container is stopped:

2014-12-09 09:44:23,274 INFO [TezChild] task.ContainerReporter: Got 
TaskUpdate: 7439 ms after starting to poll. TaskInfo: shouldDie: true
2014-12-09 09:44:23,276 INFO [main] task.TezChild: ContainerTask 
returned shouldDie=true, Exiting

It seems to me that the container is really released as soon as it is no 
more required (regardless of what could happen in the future). Is it so? 
How can I solve this?

I attach the aggregated log and the swimlanes graph that highlight this 
behavior.

Thanks guys

Fabio

Re: Containers lifespan in session mode

Posted by Fabio <an...@gmail.com>.
I got a list of configuration parameters from here 
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html
Probably they are referred to the 0.4.1, is there an official page about 
the config parameters for the latest version?

So I will remember to use those parameters. Thanks again.

Best regards

Fabio

On 12/09/2014 05:35 PM, Hitesh Shah wrote:
> We probably need to fix the docs that refer to "tez.am.container.session.delay-allocation-millis”. Can you point which doc you are referring to?  This setting was removed in 0.5.x in favor of the min/max release timeouts. To achieve the same behavior as tez.am.container.session.delay-allocation-millis, just set the min and max to the same value.
>
> — Hitesh
>
>
>
> On Dec 9, 2014, at 7:01 AM, Fabio <an...@gmail.com> wrote:
>
>> Thanks Rajesh, it was really that the problem! Actually... for a moment I thought about those parameters, but I assumed they would have been ignored during a session.
>> In my opinion, they should not be considered by the system while running in session mode, and tez.am.container.session.delay-allocation-millis should be the exact delay before releasing a container (at least when tez.am.container.session.delay-allocation-millis > tez.am.container.idle.release-timeout-min.millis)... Sure, this leads to the risk of accumulating containers up to the upper limit of the application/queue, if any. Or maybe devs could consider a warning if this condition is met, to alert the user that that parameter is going to be useless since containers will be released long before. How do you think?
>>
>> Thanks for the help
>>
>> Fabio
>>
>> On 12/09/2014 11:11 AM, Rajesh Balamohan wrote:
>>> 2014-12-09 09:39:40,314 INFO [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] rm.YarnTaskSchedulerService: TaskScheduler initialized with configuration: maxRMHeartbeatInterval: 1000, containerReuseEnabled: true, reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, idleContainerMinTimeout=5000, idleContainerMaxTimeout=10000, sessionMinHeldContainers=0
>>>
>>>
>>> Can you try the following settings instead?
>>>
>>> tez.am.container.idle.release-timeout-min.millis=400000
>>> tez.am.container.idle.release-timeout-max.millis=600000
>>>
>>> 60000 is setting to 10 minutes.
>>>
>>> ~Rajesh.B
>>>
>>>
>>> On Tue, Dec 9, 2014 at 3:21 PM, Fabio <an...@gmail.com> wrote:
>>> Hi everyone,
>>> I'm currently running Hive on Tez, especially I am testing the session mode. I can actually submit different queries to the same Tez AM, and that's ok. But for some reason containers are released a very short time after the end of the assigned task, whenever no new task is pending. In such a way there is no chance for container reuse among different queries. I already tried to set tez.am.container.session.delay-allocation-millis=-1 (and before this, to 600000), but this behavior persists.
>>> In the logs I see this two suspicious lines:
>>>
>>> 2014-12-09 09:44:23,035 INFO [DelayedContainerManager] rm.YarnTaskSchedulerService: Releasing unused container: container_1418090991482_0008_01_000002
>>>
>>> and a few milliseconds after the container is stopped:
>>>
>>> 2014-12-09 09:44:23,274 INFO [TezChild] task.ContainerReporter: Got TaskUpdate: 7439 ms after starting to poll. TaskInfo: shouldDie: true
>>> 2014-12-09 09:44:23,276 INFO [main] task.TezChild: ContainerTask returned shouldDie=true, Exiting
>>>
>>> It seems to me that the container is really released as soon as it is no more required (regardless of what could happen in the future). Is it so? How can I solve this?
>>>
>>> I attach the aggregated log and the swimlanes graph that highlight this behavior.
>>>
>>> Thanks guys
>>>
>>> Fabio
>>>
>>>
>>>
>>> -- 
>>> ~Rajesh.B
>


Re: Containers lifespan in session mode

Posted by Hitesh Shah <hi...@apache.org>.
We probably need to fix the docs that refer to "tez.am.container.session.delay-allocation-millis”. Can you point which doc you are referring to?  This setting was removed in 0.5.x in favor of the min/max release timeouts. To achieve the same behavior as tez.am.container.session.delay-allocation-millis, just set the min and max to the same value.

— Hitesh 



On Dec 9, 2014, at 7:01 AM, Fabio <an...@gmail.com> wrote:

> Thanks Rajesh, it was really that the problem! Actually... for a moment I thought about those parameters, but I assumed they would have been ignored during a session. 
> In my opinion, they should not be considered by the system while running in session mode, and tez.am.container.session.delay-allocation-millis should be the exact delay before releasing a container (at least when tez.am.container.session.delay-allocation-millis > tez.am.container.idle.release-timeout-min.millis)... Sure, this leads to the risk of accumulating containers up to the upper limit of the application/queue, if any. Or maybe devs could consider a warning if this condition is met, to alert the user that that parameter is going to be useless since containers will be released long before. How do you think?
> 
> Thanks for the help
> 
> Fabio
> 
> On 12/09/2014 11:11 AM, Rajesh Balamohan wrote:
>> >>>
>> 2014-12-09 09:39:40,314 INFO [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] rm.YarnTaskSchedulerService: TaskScheduler initialized with configuration: maxRMHeartbeatInterval: 1000, containerReuseEnabled: true, reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, idleContainerMinTimeout=5000, idleContainerMaxTimeout=10000, sessionMinHeldContainers=0
>> >>>
>> 
>> 
>> 
>> Can you try the following settings instead?
>> 
>> tez.am.container.idle.release-timeout-min.millis=400000
>> tez.am.container.idle.release-timeout-max.millis=600000
>> 
>> 60000 is setting to 10 minutes. 
>> 
>> ~Rajesh.B
>> 
>> 
>> On Tue, Dec 9, 2014 at 3:21 PM, Fabio <an...@gmail.com> wrote:
>> Hi everyone,
>> I'm currently running Hive on Tez, especially I am testing the session mode. I can actually submit different queries to the same Tez AM, and that's ok. But for some reason containers are released a very short time after the end of the assigned task, whenever no new task is pending. In such a way there is no chance for container reuse among different queries. I already tried to set tez.am.container.session.delay-allocation-millis=-1 (and before this, to 600000), but this behavior persists.
>> In the logs I see this two suspicious lines:
>> 
>> 2014-12-09 09:44:23,035 INFO [DelayedContainerManager] rm.YarnTaskSchedulerService: Releasing unused container: container_1418090991482_0008_01_000002
>> 
>> and a few milliseconds after the container is stopped:
>> 
>> 2014-12-09 09:44:23,274 INFO [TezChild] task.ContainerReporter: Got TaskUpdate: 7439 ms after starting to poll. TaskInfo: shouldDie: true
>> 2014-12-09 09:44:23,276 INFO [main] task.TezChild: ContainerTask returned shouldDie=true, Exiting
>> 
>> It seems to me that the container is really released as soon as it is no more required (regardless of what could happen in the future). Is it so? How can I solve this?
>> 
>> I attach the aggregated log and the swimlanes graph that highlight this behavior.
>> 
>> Thanks guys
>> 
>> Fabio
>> 
>> 
>> 
>> -- 
>> ~Rajesh.B
> 


Re: Containers lifespan in session mode

Posted by Fabio <an...@gmail.com>.
Thanks Rajesh, it was really that the problem! Actually... for a moment 
I thought about those parameters, but I assumed they would have been 
ignored during a session.
In my opinion, they should not be considered by the system while running 
in session mode, and tez.am.container.session.delay-allocation-millis 
should be the exact delay before releasing a container (at least when 
tez.am.container.session.delay-allocation-millis > 
tez.am.container.idle.release-timeout-min.millis)... Sure, this leads to 
the risk of accumulating containers up to the upper limit of the 
application/queue, if any. Or maybe devs could consider a warning if 
this condition is met, to alert the user that that parameter is going to 
be useless since containers will be released long before. How do you think?

Thanks for the help

Fabio

On 12/09/2014 11:11 AM, Rajesh Balamohan wrote:
> >>>
> 2014-12-09 09:39:40,314 INFO 
> [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] 
> rm.YarnTaskSchedulerService: TaskScheduler initialized with 
> configuration: maxRMHeartbeatInterval: 1000, containerReuseEnabled: 
> true, reuseRackLocal: true, reuseNonLocal: false, 
> localitySchedulingDelay: 250, idleContainerMinTimeout=5000, 
> idleContainerMaxTimeout=10000, sessionMinHeldContainers=0
> >>>
>
>
>
> Can you try the following settings instead?
>
> tez.am.container.idle.release-timeout-min.millis=400000
> tez.am.container.idle.release-timeout-max.millis=600000
>
> 60000 is setting to 10 minutes.
>
> ~Rajesh.B
>
>
> On Tue, Dec 9, 2014 at 3:21 PM, Fabio <anytek88@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi everyone,
>     I'm currently running Hive on Tez, especially I am testing the
>     session mode. I can actually submit different queries to the same
>     Tez AM, and that's ok. But for some reason containers are released
>     a very short time after the end of the assigned task, whenever no
>     new task is pending. In such a way there is no chance for
>     container reuse among different queries. I already tried to set
>     tez.am.container.session.delay-allocation-millis=-1 (and before
>     this, to 600000), but this behavior persists.
>     In the logs I see this two suspicious lines:
>
>     2014-12-09 09:44:23,035 INFO [DelayedContainerManager]
>     rm.YarnTaskSchedulerService: Releasing unused container:
>     container_1418090991482_0008_01_000002
>
>     and a few milliseconds after the container is stopped:
>
>     2014-12-09 09:44:23,274 INFO [TezChild] task.ContainerReporter:
>     Got TaskUpdate: 7439 ms after starting to poll. TaskInfo:
>     shouldDie: true
>     2014-12-09 09:44:23,276 INFO [main] task.TezChild: ContainerTask
>     returned shouldDie=true, Exiting
>
>     It seems to me that the container is really released as soon as it
>     is no more required (regardless of what could happen in the
>     future). Is it so? How can I solve this?
>
>     I attach the aggregated log and the swimlanes graph that highlight
>     this behavior.
>
>     Thanks guys
>
>     Fabio
>
>
>
>
> -- 
> ~Rajesh.B


Re: Containers lifespan in session mode

Posted by Rajesh Balamohan <ra...@gmail.com>.
>>>
2014-12-09 09:39:40,314 INFO
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler]
rm.YarnTaskSchedulerService: TaskScheduler initialized with configuration:
maxRMHeartbeatInterval: 1000, containerReuseEnabled: true, reuseRackLocal:
true, reuseNonLocal: false, localitySchedulingDelay: 250,
idleContainerMinTimeout=5000, idleContainerMaxTimeout=10000,
sessionMinHeldContainers=0
>>>



Can you try the following settings instead?

tez.am.container.idle.release-timeout-min.millis=400000
tez.am.container.idle.release-timeout-max.millis=600000

60000 is setting to 10 minutes.

~Rajesh.B


On Tue, Dec 9, 2014 at 3:21 PM, Fabio <an...@gmail.com> wrote:

> Hi everyone,
> I'm currently running Hive on Tez, especially I am testing the session
> mode. I can actually submit different queries to the same Tez AM, and
> that's ok. But for some reason containers are released a very short time
> after the end of the assigned task, whenever no new task is pending. In
> such a way there is no chance for container reuse among different queries.
> I already tried to set tez.am.container.session.delay-allocation-millis=-1
> (and before this, to 600000), but this behavior persists.
> In the logs I see this two suspicious lines:
>
> 2014-12-09 09:44:23,035 INFO [DelayedContainerManager]
> rm.YarnTaskSchedulerService: Releasing unused container:
> container_1418090991482_0008_01_000002
>
> and a few milliseconds after the container is stopped:
>
> 2014-12-09 09:44:23,274 INFO [TezChild] task.ContainerReporter: Got
> TaskUpdate: 7439 ms after starting to poll. TaskInfo: shouldDie: true
> 2014-12-09 09:44:23,276 INFO [main] task.TezChild: ContainerTask returned
> shouldDie=true, Exiting
>
> It seems to me that the container is really released as soon as it is no
> more required (regardless of what could happen in the future). Is it so?
> How can I solve this?
>
> I attach the aggregated log and the swimlanes graph that highlight this
> behavior.
>
> Thanks guys
>
> Fabio
>



-- 
~Rajesh.B