You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by "Ganelin, Ilya" <Il...@capitalone.com> on 2017/05/30 17:11:19 UTC

Container failure without relaunch

Hi all – several times now I’ve noticed odd behavior with our app. When running for several days or more, I’ll observe that following an operator failure, the container does not relaunch. I’m not sure what accounts for this, I don’t see any further errors in the log following the initial “stop” + “operator remove, it’s as if recovery is not working. Any thoughts on what could be causing this?
[cid:image001.png@01D2D92D.1417DE70]

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Container failure without relaunch

Posted by Vlad Rozov <v....@datatorrent.com>.
It may also help to enable DEBUG level logging for com.datatorrent.* 
once the issue is reproduced again and check activity in the application 
master logs.

Thank you,

Vlad

On 5/30/17 10:41, Sandesh Hegde wrote:
> When that issue happens, please check the free resource(CPU and 
> memory) available for Yarn.
>
> On Tue, May 30, 2017 at 10:36 AM Ganelin, Ilya 
> <Ilya.Ganelin@capitalone.com <ma...@capitalone.com>> wrote:
>
>     I think I checked this and I don’t see any activity whatsoever. No
>     re-launch, just empty tabs. I’ll try to provide a screenshot next
>     time it happens.
>
>     - Ilya Ganelin
>
>     id:image001.png@01D1F7A4.F3D42980
>
>     *From: *Pramod Immaneni <pramod@datatorrent.com
>     <ma...@datatorrent.com>>
>     *Reply-To: *"users@apex.apache.org <ma...@apex.apache.org>"
>     <users@apex.apache.org <ma...@apex.apache.org>>
>     *Date: *Tuesday, May 30, 2017 at 10:17 AM
>     *To: *"users@apex.apache.org <ma...@apex.apache.org>"
>     <users@apex.apache.org <ma...@apex.apache.org>>
>     *Cc: *DataTorrent Users Group <dt-users@googlegroups.com
>     <ma...@googlegroups.com>>
>     *Subject: *Re: Container failure without relaunch
>
>     Hi Ilya,
>
>     What is the state of the physical containers in the physical
>     tab. Are the containers dying and continuously restarting.
>
>     Thanks
>
>     On Tue, May 30, 2017 at 10:11 AM, Ganelin, Ilya
>     <Ilya.Ganelin@capitalone.com <ma...@capitalone.com>>
>     wrote:
>
>         Hi all – several times now I’ve noticed odd behavior with our
>         app. When running for several days or more, I’ll observe that
>         following an operator failure, the container does not
>         relaunch. I’m not sure what accounts for this, I don’t see any
>         further errors in the log following the initial “stop” +
>         “operator remove, it’s as if recovery is not working. Any
>         thoughts on what could be causing this?
>
>         - Ilya Ganelin
>
>         ------------------------------------------------------------------------
>
>         The information contained in this e-mail is confidential
>         and/or proprietary to Capital One and/or its affiliates and
>         may only be used solely in performance of work or services for
>         Capital One. The information transmitted herewith is intended
>         only for use by the individual or entity to which it is
>         addressed. If the reader of this message is not the intended
>         recipient, you are hereby notified that any review,
>         retransmission, dissemination, distribution, copying or other
>         use of, or taking of any action in reliance upon this
>         information is strictly prohibited. If you have received this
>         communication in error, please contact the sender and delete
>         the material from your computer.
>
>
>     ------------------------------------------------------------------------
>
>     The information contained in this e-mail is confidential and/or
>     proprietary to Capital One and/or its affiliates and may only be
>     used solely in performance of work or services for Capital One.
>     The information transmitted herewith is intended only for use by
>     the individual or entity to which it is addressed. If the reader
>     of this message is not the intended recipient, you are hereby
>     notified that any review, retransmission, dissemination,
>     distribution, copying or other use of, or taking of any action in
>     reliance upon this information is strictly prohibited. If you have
>     received this communication in error, please contact the sender
>     and delete the material from your computer.
>


Re: Container failure without relaunch

Posted by Sandesh Hegde <sa...@datatorrent.com>.
When that issue happens, please check the free resource(CPU and memory)
available for Yarn.

On Tue, May 30, 2017 at 10:36 AM Ganelin, Ilya <Il...@capitalone.com>
wrote:

> I think I checked this and I don’t see any activity whatsoever. No
> re-launch, just empty tabs. I’ll try to provide a screenshot next time it
> happens.
>
>
>
> - Ilya Ganelin
>
> [image: id:image001.png@01D1F7A4.F3D42980]
>
>
>
> *From: *Pramod Immaneni <pr...@datatorrent.com>
> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Date: *Tuesday, May 30, 2017 at 10:17 AM
> *To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Cc: *DataTorrent Users Group <dt...@googlegroups.com>
> *Subject: *Re: Container failure without relaunch
>
>
>
> Hi Ilya,
>
>
>
> What is the state of the physical containers in the physical tab. Are the
> containers dying and continuously restarting.
>
>
>
> Thanks
>
> On Tue, May 30, 2017 at 10:11 AM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com> wrote:
>
> Hi all – several times now I’ve noticed odd behavior with our app. When
> running for several days or more, I’ll observe that following an operator
> failure, the container does not relaunch. I’m not sure what accounts for
> this, I don’t see any further errors in the log following the initial
> “stop” + “operator remove, it’s as if recovery is not working. Any thoughts
> on what could be causing this?
>
>
>
> - Ilya Ganelin
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Container failure without relaunch

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
I think I checked this and I don’t see any activity whatsoever. No re-launch, just empty tabs. I’ll try to provide a screenshot next time it happens.

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]

From: Pramod Immaneni <pr...@datatorrent.com>
Reply-To: "users@apex.apache.org" <us...@apex.apache.org>
Date: Tuesday, May 30, 2017 at 10:17 AM
To: "users@apex.apache.org" <us...@apex.apache.org>
Cc: DataTorrent Users Group <dt...@googlegroups.com>
Subject: Re: Container failure without relaunch

Hi Ilya,

What is the state of the physical containers in the physical tab. Are the containers dying and continuously restarting.

Thanks

On Tue, May 30, 2017 at 10:11 AM, Ganelin, Ilya <Il...@capitalone.com>> wrote:
Hi all – several times now I’ve noticed odd behavior with our app. When running for several days or more, I’ll observe that following an operator failure, the container does not relaunch. I’m not sure what accounts for this, I don’t see any further errors in the log following the initial “stop” + “operator remove, it’s as if recovery is not working. Any thoughts on what could be causing this?
[cid:image002.png@01D2D92F.A3EE4F00]

- Ilya Ganelin
[cid:image003.png@01D2D92F.A3EE4F00]

________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Container failure without relaunch

Posted by Pramod Immaneni <pr...@datatorrent.com>.
Hi Ilya,

What is the state of the physical containers in the physical tab. Are the
containers dying and continuously restarting.

Thanks

On Tue, May 30, 2017 at 10:11 AM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com
> wrote:

> Hi all – several times now I’ve noticed odd behavior with our app. When
> running for several days or more, I’ll observe that following an operator
> failure, the container does not relaunch. I’m not sure what accounts for
> this, I don’t see any further errors in the log following the initial
> “stop” + “operator remove, it’s as if recovery is not working. Any thoughts
> on what could be causing this?
>
>
>
> - Ilya Ganelin
>
> [image: id:image001.png@01D1F7A4.F3D42980]
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>