You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by "Fabio C." <an...@gmail.com> on 2015/02/20 11:09:39 UTC

Container release time at end of AM

Hi guys,
I was measuring the time it takes to a delayed container (kept for
container reuse) to be released when the tez application master is going to
shutdown at the end of its life.
I run the same Hive-on-Tez query 100 times, and as you can see in the
attached plot there is something strange:
- most of the containers (around 80%) are released in almost exactly one
second
- a few containers are released in a time that spans from a very few
milliseconds to approximately a time equal to the AM-RM heartbeat
(suggesting that the AM is the one telling the RM about the end of the
container).
The NM-RM heartbeat time is 1s and I consider the release interval to be
between the "Sending a stop request to the NM for ContainerId" log entry
(AM side) and the queue update (RM side).
I could manually check just a few logs, but it seems the second case
happens when the container is actually able to stop before the end of the
AM, while if the AM dies we fall in the first case.
I have a suspect that if the AM is dead, the RM will wait for the NM
heartbeat to consider the resources available, anyway what I would expect
in this case is to have a uniform distribution between delta and 1s+delta
(with delta equal to a few ms).
What is really happening here in your opinion? How can the variance of the
first case be so small?

Thanks

Fabio

Re: Container release time at end of AM

Posted by "Fabio C." <an...@gmail.com>.
Hi Bikas, no problem.
Thanks for the clear answer.

Regards

Fabio

On Tue, Mar 3, 2015 at 8:21 PM, Bikas Saha <bi...@hortonworks.com> wrote:

>  Apologies for the delayed response.
>
>
>
> IIRC, the scheduler in the AM does not release containers upon AM exit.
> Perhaps we should do that to make the hand-off to the RM explicit.
>
> The ContainerLauncher, though, does try to send a stop request to the NM
> for the containers it has launched.
>
> You are right, the RM will garbage collect all containers from an
> application after the AM for that application has finished. This would
> happen in the next heartbeat with the NMs after the RM has figured out that
> the application is done.
>
>
>
> Bikas
>
>
>
>
>
> *From:* Fabio C. [mailto:anytek88@gmail.com]
> *Sent:* Friday, February 20, 2015 2:10 AM
> *To:* user@tez.apache.org
> *Subject:* Container release time at end of AM
>
>
>
> Hi guys,
> I was measuring the time it takes to a delayed container (kept for
> container reuse) to be released when the tez application master is going to
> shutdown at the end of its life.
> I run the same Hive-on-Tez query 100 times, and as you can see in the
> attached plot there is something strange:
> - most of the containers (around 80%) are released in almost exactly one
> second
> - a few containers are released in a time that spans from a very few
> milliseconds to approximately a time equal to the AM-RM heartbeat
> (suggesting that the AM is the one telling the RM about the end of the
> container).
> The NM-RM heartbeat time is 1s and I consider the release interval to be
> between the "Sending a stop request to the NM for ContainerId" log entry
> (AM side) and the queue update (RM side).
> I could manually check just a few logs, but it seems the second case
> happens when the container is actually able to stop before the end of the
> AM, while if the AM dies we fall in the first case.
> I have a suspect that if the AM is dead, the RM will wait for the NM
> heartbeat to consider the resources available, anyway what I would expect
> in this case is to have a uniform distribution between delta and 1s+delta
> (with delta equal to a few ms).
> What is really happening here in your opinion? How can the variance of the
> first case be so small?
>
> Thanks
>
> Fabio
>

RE: Container release time at end of AM

Posted by Bikas Saha <bi...@hortonworks.com>.
Apologies for the delayed response.

IIRC, the scheduler in the AM does not release containers upon AM exit. Perhaps we should do that to make the hand-off to the RM explicit.
The ContainerLauncher, though, does try to send a stop request to the NM for the containers it has launched.
You are right, the RM will garbage collect all containers from an application after the AM for that application has finished. This would happen in the next heartbeat with the NMs after the RM has figured out that the application is done.

Bikas


From: Fabio C. [mailto:anytek88@gmail.com]
Sent: Friday, February 20, 2015 2:10 AM
To: user@tez.apache.org
Subject: Container release time at end of AM

Hi guys,
I was measuring the time it takes to a delayed container (kept for container reuse) to be released when the tez application master is going to shutdown at the end of its life.
I run the same Hive-on-Tez query 100 times, and as you can see in the attached plot there is something strange:
- most of the containers (around 80%) are released in almost exactly one second
- a few containers are released in a time that spans from a very few milliseconds to approximately a time equal to the AM-RM heartbeat (suggesting that the AM is the one telling the RM about the end of the container).
The NM-RM heartbeat time is 1s and I consider the release interval to be between the "Sending a stop request to the NM for ContainerId" log entry (AM side) and the queue update (RM side).
I could manually check just a few logs, but it seems the second case happens when the container is actually able to stop before the end of the AM, while if the AM dies we fall in the first case.
I have a suspect that if the AM is dead, the RM will wait for the NM heartbeat to consider the resources available, anyway what I would expect in this case is to have a uniform distribution between delta and 1s+delta (with delta equal to a few ms).
What is really happening here in your opinion? How can the variance of the first case be so small?

Thanks

Fabio