You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Isha Arkatkar <is...@datatorrent.com> on 2016/03/02 04:27:16 UTC

Re: Not catching exceptions from down stream operator

Hi,

  I checked the application  https://github.com/chaithu14/AppThreadLocal

  In this example, exception from downstream operator is thrown in a
different thread in AbstractReconciler operator. And the rethrow to main
operator thread is done in handleIdleTime.  This function is not guaranteed
to be invoked in every window. In Thread_local locality I checked that
handleIdleTime did not get invoked. So, the exception did not get rethrown.

  The exception thrown from a different thread other than the main operator
thread are not caught by Application Master. Something we can probably add
to troubleshooting guide to add a rethrow in the main thread.

  I verified that if downstream operator throws exception in the main
thread, it is caught appropriately by application master even in thread
local case.

Thanks,
Isha

On Thu, Feb 25, 2016 at 9:57 PM, Chaitanya Chebolu <
chaitanya@datatorrent.com> wrote:

> Hi All,
>
>   Created Sample application for THREAD_LOCAL issue. Application is here
> <https://github.com/chaithu14/AppThreadLocal>.
>   Application has the following DAG:
>
>                 RandomEventGenerator -> OuputOperator.
>
> Both the operators are THREAD_LOCAL.
>
>   In OutputOperator, throwing exceptions at every committed window. So,
> AppMaster supposed to kill container at every committed window. This is
> expected behavior.
>   But, this is not happening with the current Apex.
>
>   One more observation is, If the upstream operator throws exception at
> every committed window, then AppMaster is killing the container
> continuously. But, this is not happening with the downstream operator.
>
>  Created JIRA for this issue: APEXCORE-357
>
> Regards,
> Chaitanya
>
> On Thu, Feb 25, 2016 at 12:36 PM, Chaitanya Chebolu <
> chaitanya@datatorrent.com> wrote:
>
> > Hi ,
> >
> >   I am facing issues in Thread_Local. Two operators which are thread
> local
> > and out of which, the downstream operator throws exceptions. But,
> AppMaster
> > is not catching those exceptions. I was unable to figure out why
> > application is not working.
> >   If both the operators are deployed on different containers, then the
> > container is killed continuously by AppMaster. This is expected behavior.
> >
> >    For Example, Let's say the dag be op1 -> op2 where op1, op2 are two
> > operators which are of them thread local. Throws an exception from the
> > downstream operator op2, AppMaster is not catching exceptions. I will
> > create a JIRA for this issue. Please some one help on this.
> >
> > Regards,
> > Chaitanya
> >
>

Re: Not catching exceptions from down stream operator

Posted by Chaitanya Chebolu <ch...@datatorrent.com>.
Thanks Isha for the details.
Application is running fine with the later version of Apex.

On Fri, Mar 4, 2016 at 11:19 AM, Isha Arkatkar <is...@datatorrent.com> wrote:

> Hi Chaitanya,
>
>     The bug you mentioned is actually fixed in the latest version. The fix
> for Jira APEXCORE-130
> <https://issues.apache.org/jira/browse/APEXCORE-130> handles
> this issue as well.
>     Please try once with the latest changes from master.
>
> This is the commit id with fix: 139a9cac6397948bb63a53ea80188f2ffd6e5da2
>
> Thanks!
> Isha
>
>
> On Thu, Mar 3, 2016 at 5:26 AM, Chaitanya Chebolu <
> chaitanya@datatorrent.com
> > wrote:
>
> > Thanks Isha for analyzing the issue.
> >
> > I am adding your analysis to the JIRA.
> >
> > I observed one more issue in THREAD_LOCAL.
> >
> > Let's the DAG be as follows:
> >     A -> B -> C
> >
> > Where A, B, C are operators, B and C are the operators which are them
> > THREAD_LOCAL.
> >
> >
> > If the downstream operator (i.e Operator C) throws exception from the
> main
> > thread, then application master caught exception and killed the
> container.
> > New container allocated for B and C operators. B is re-deployed into the
> > newly allocated container and the status is ACTIVE, but, C is not
> > re-deploying.
> >
> > After re-deployment of Operator B, DAG be as follows:
> >      A -> B.
> >
> > I looked into Stram Logs, observed the following message:
> > "INFO com.datatorrent.stram.StreamingContainerManager: Affected operators
> > [PTOperator[id=2,name=B]]".
> >
> > I think this is the issue. Here, Operator C is not there in affected
> > operators.
> >
> > I created an application for this issue. Sample Application is here
> > <https://github.com/chaithu14/AppThreadLocal/tree/theadBranch>.
> >
> > @Isha: Have you observed the same behavior?
> >
> > I am creating a JIRA for this issue.
> >
> > Regards,
> > Chaitanya
> >
> > On Wed, Mar 2, 2016 at 9:34 AM, Sandeep Deshmukh <
> sandeep@datatorrent.com>
> > wrote:
> >
> > > Great finding Isha.
> > >
> > > In general, it is always advisable to do things in main thread. We had
> > some
> > > timing issues in dtIngest  as we were emitting tuples in the Reconciler
> > > thread. Once we moved all emit statements to the main thread, there
> were
> > no
> > > issues observed.
> > >
> > > Issue: When tuples are emitted in Reconciler thread, some of them were
> > > emitted post endWindow but before the checkpointing is done. These
> tuples
> > > for the downstream operator are not guaranteed to reach the same
> window.
> > > Thus checkpointing of the two operators is not in sync and that could
> > > result in few tuples replayed wrongly from the Reconciler based
> operator.
> > >
> > > Regards,
> > > Sandeep
> > >
> > > On Wed, Mar 2, 2016 at 8:57 AM, Isha Arkatkar <is...@datatorrent.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > >   I checked the application
> > https://github.com/chaithu14/AppThreadLocal
> > > >
> > > >   In this example, exception from downstream operator is thrown in a
> > > > different thread in AbstractReconciler operator. And the rethrow to
> > main
> > > > operator thread is done in handleIdleTime.  This function is not
> > > guaranteed
> > > > to be invoked in every window. In Thread_local locality I checked
> that
> > > > handleIdleTime did not get invoked. So, the exception did not get
> > > rethrown.
> > > >
> > > >   The exception thrown from a different thread other than the main
> > > operator
> > > > thread are not caught by Application Master. Something we can
> probably
> > > add
> > > > to troubleshooting guide to add a rethrow in the main thread.
> > > >
> > > >   I verified that if downstream operator throws exception in the main
> > > > thread, it is caught appropriately by application master even in
> thread
> > > > local case.
> > > >
> > > > Thanks,
> > > > Isha
> > > >
> > > > On Thu, Feb 25, 2016 at 9:57 PM, Chaitanya Chebolu <
> > > > chaitanya@datatorrent.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > >   Created Sample application for THREAD_LOCAL issue. Application is
> > > here
> > > > > <https://github.com/chaithu14/AppThreadLocal>.
> > > > >   Application has the following DAG:
> > > > >
> > > > >                 RandomEventGenerator -> OuputOperator.
> > > > >
> > > > > Both the operators are THREAD_LOCAL.
> > > > >
> > > > >   In OutputOperator, throwing exceptions at every committed window.
> > So,
> > > > > AppMaster supposed to kill container at every committed window.
> This
> > is
> > > > > expected behavior.
> > > > >   But, this is not happening with the current Apex.
> > > > >
> > > > >   One more observation is, If the upstream operator throws
> exception
> > at
> > > > > every committed window, then AppMaster is killing the container
> > > > > continuously. But, this is not happening with the downstream
> > operator.
> > > > >
> > > > >  Created JIRA for this issue: APEXCORE-357
> > > > >
> > > > > Regards,
> > > > > Chaitanya
> > > > >
> > > > > On Thu, Feb 25, 2016 at 12:36 PM, Chaitanya Chebolu <
> > > > > chaitanya@datatorrent.com> wrote:
> > > > >
> > > > > > Hi ,
> > > > > >
> > > > > >   I am facing issues in Thread_Local. Two operators which are
> > thread
> > > > > local
> > > > > > and out of which, the downstream operator throws exceptions. But,
> > > > > AppMaster
> > > > > > is not catching those exceptions. I was unable to figure out why
> > > > > > application is not working.
> > > > > >   If both the operators are deployed on different containers,
> then
> > > the
> > > > > > container is killed continuously by AppMaster. This is expected
> > > > behavior.
> > > > > >
> > > > > >    For Example, Let's say the dag be op1 -> op2 where op1, op2
> are
> > > two
> > > > > > operators which are of them thread local. Throws an exception
> from
> > > the
> > > > > > downstream operator op2, AppMaster is not catching exceptions. I
> > will
> > > > > > create a JIRA for this issue. Please some one help on this.
> > > > > >
> > > > > > Regards,
> > > > > > Chaitanya
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Not catching exceptions from down stream operator

Posted by Isha Arkatkar <is...@datatorrent.com>.
Hi Chaitanya,

    The bug you mentioned is actually fixed in the latest version. The fix
for Jira APEXCORE-130
<https://issues.apache.org/jira/browse/APEXCORE-130> handles
this issue as well.
    Please try once with the latest changes from master.

This is the commit id with fix: 139a9cac6397948bb63a53ea80188f2ffd6e5da2

Thanks!
Isha


On Thu, Mar 3, 2016 at 5:26 AM, Chaitanya Chebolu <chaitanya@datatorrent.com
> wrote:

> Thanks Isha for analyzing the issue.
>
> I am adding your analysis to the JIRA.
>
> I observed one more issue in THREAD_LOCAL.
>
> Let's the DAG be as follows:
>     A -> B -> C
>
> Where A, B, C are operators, B and C are the operators which are them
> THREAD_LOCAL.
>
>
> If the downstream operator (i.e Operator C) throws exception from the main
> thread, then application master caught exception and killed the container.
> New container allocated for B and C operators. B is re-deployed into the
> newly allocated container and the status is ACTIVE, but, C is not
> re-deploying.
>
> After re-deployment of Operator B, DAG be as follows:
>      A -> B.
>
> I looked into Stram Logs, observed the following message:
> "INFO com.datatorrent.stram.StreamingContainerManager: Affected operators
> [PTOperator[id=2,name=B]]".
>
> I think this is the issue. Here, Operator C is not there in affected
> operators.
>
> I created an application for this issue. Sample Application is here
> <https://github.com/chaithu14/AppThreadLocal/tree/theadBranch>.
>
> @Isha: Have you observed the same behavior?
>
> I am creating a JIRA for this issue.
>
> Regards,
> Chaitanya
>
> On Wed, Mar 2, 2016 at 9:34 AM, Sandeep Deshmukh <sa...@datatorrent.com>
> wrote:
>
> > Great finding Isha.
> >
> > In general, it is always advisable to do things in main thread. We had
> some
> > timing issues in dtIngest  as we were emitting tuples in the Reconciler
> > thread. Once we moved all emit statements to the main thread, there were
> no
> > issues observed.
> >
> > Issue: When tuples are emitted in Reconciler thread, some of them were
> > emitted post endWindow but before the checkpointing is done. These tuples
> > for the downstream operator are not guaranteed to reach the same window.
> > Thus checkpointing of the two operators is not in sync and that could
> > result in few tuples replayed wrongly from the Reconciler based operator.
> >
> > Regards,
> > Sandeep
> >
> > On Wed, Mar 2, 2016 at 8:57 AM, Isha Arkatkar <is...@datatorrent.com>
> > wrote:
> >
> > > Hi,
> > >
> > >   I checked the application
> https://github.com/chaithu14/AppThreadLocal
> > >
> > >   In this example, exception from downstream operator is thrown in a
> > > different thread in AbstractReconciler operator. And the rethrow to
> main
> > > operator thread is done in handleIdleTime.  This function is not
> > guaranteed
> > > to be invoked in every window. In Thread_local locality I checked that
> > > handleIdleTime did not get invoked. So, the exception did not get
> > rethrown.
> > >
> > >   The exception thrown from a different thread other than the main
> > operator
> > > thread are not caught by Application Master. Something we can probably
> > add
> > > to troubleshooting guide to add a rethrow in the main thread.
> > >
> > >   I verified that if downstream operator throws exception in the main
> > > thread, it is caught appropriately by application master even in thread
> > > local case.
> > >
> > > Thanks,
> > > Isha
> > >
> > > On Thu, Feb 25, 2016 at 9:57 PM, Chaitanya Chebolu <
> > > chaitanya@datatorrent.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > >   Created Sample application for THREAD_LOCAL issue. Application is
> > here
> > > > <https://github.com/chaithu14/AppThreadLocal>.
> > > >   Application has the following DAG:
> > > >
> > > >                 RandomEventGenerator -> OuputOperator.
> > > >
> > > > Both the operators are THREAD_LOCAL.
> > > >
> > > >   In OutputOperator, throwing exceptions at every committed window.
> So,
> > > > AppMaster supposed to kill container at every committed window. This
> is
> > > > expected behavior.
> > > >   But, this is not happening with the current Apex.
> > > >
> > > >   One more observation is, If the upstream operator throws exception
> at
> > > > every committed window, then AppMaster is killing the container
> > > > continuously. But, this is not happening with the downstream
> operator.
> > > >
> > > >  Created JIRA for this issue: APEXCORE-357
> > > >
> > > > Regards,
> > > > Chaitanya
> > > >
> > > > On Thu, Feb 25, 2016 at 12:36 PM, Chaitanya Chebolu <
> > > > chaitanya@datatorrent.com> wrote:
> > > >
> > > > > Hi ,
> > > > >
> > > > >   I am facing issues in Thread_Local. Two operators which are
> thread
> > > > local
> > > > > and out of which, the downstream operator throws exceptions. But,
> > > > AppMaster
> > > > > is not catching those exceptions. I was unable to figure out why
> > > > > application is not working.
> > > > >   If both the operators are deployed on different containers, then
> > the
> > > > > container is killed continuously by AppMaster. This is expected
> > > behavior.
> > > > >
> > > > >    For Example, Let's say the dag be op1 -> op2 where op1, op2 are
> > two
> > > > > operators which are of them thread local. Throws an exception from
> > the
> > > > > downstream operator op2, AppMaster is not catching exceptions. I
> will
> > > > > create a JIRA for this issue. Please some one help on this.
> > > > >
> > > > > Regards,
> > > > > Chaitanya
> > > > >
> > > >
> > >
> >
>

Re: Not catching exceptions from down stream operator

Posted by Chaitanya Chebolu <ch...@datatorrent.com>.
Thanks Isha for analyzing the issue.

I am adding your analysis to the JIRA.

I observed one more issue in THREAD_LOCAL.

Let's the DAG be as follows:
    A -> B -> C

Where A, B, C are operators, B and C are the operators which are them
THREAD_LOCAL.


If the downstream operator (i.e Operator C) throws exception from the main
thread, then application master caught exception and killed the container.
New container allocated for B and C operators. B is re-deployed into the
newly allocated container and the status is ACTIVE, but, C is not
re-deploying.

After re-deployment of Operator B, DAG be as follows:
     A -> B.

I looked into Stram Logs, observed the following message:
"INFO com.datatorrent.stram.StreamingContainerManager: Affected operators
[PTOperator[id=2,name=B]]".

I think this is the issue. Here, Operator C is not there in affected
operators.

I created an application for this issue. Sample Application is here
<https://github.com/chaithu14/AppThreadLocal/tree/theadBranch>.

@Isha: Have you observed the same behavior?

I am creating a JIRA for this issue.

Regards,
Chaitanya

On Wed, Mar 2, 2016 at 9:34 AM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> Great finding Isha.
>
> In general, it is always advisable to do things in main thread. We had some
> timing issues in dtIngest  as we were emitting tuples in the Reconciler
> thread. Once we moved all emit statements to the main thread, there were no
> issues observed.
>
> Issue: When tuples are emitted in Reconciler thread, some of them were
> emitted post endWindow but before the checkpointing is done. These tuples
> for the downstream operator are not guaranteed to reach the same window.
> Thus checkpointing of the two operators is not in sync and that could
> result in few tuples replayed wrongly from the Reconciler based operator.
>
> Regards,
> Sandeep
>
> On Wed, Mar 2, 2016 at 8:57 AM, Isha Arkatkar <is...@datatorrent.com>
> wrote:
>
> > Hi,
> >
> >   I checked the application  https://github.com/chaithu14/AppThreadLocal
> >
> >   In this example, exception from downstream operator is thrown in a
> > different thread in AbstractReconciler operator. And the rethrow to main
> > operator thread is done in handleIdleTime.  This function is not
> guaranteed
> > to be invoked in every window. In Thread_local locality I checked that
> > handleIdleTime did not get invoked. So, the exception did not get
> rethrown.
> >
> >   The exception thrown from a different thread other than the main
> operator
> > thread are not caught by Application Master. Something we can probably
> add
> > to troubleshooting guide to add a rethrow in the main thread.
> >
> >   I verified that if downstream operator throws exception in the main
> > thread, it is caught appropriately by application master even in thread
> > local case.
> >
> > Thanks,
> > Isha
> >
> > On Thu, Feb 25, 2016 at 9:57 PM, Chaitanya Chebolu <
> > chaitanya@datatorrent.com> wrote:
> >
> > > Hi All,
> > >
> > >   Created Sample application for THREAD_LOCAL issue. Application is
> here
> > > <https://github.com/chaithu14/AppThreadLocal>.
> > >   Application has the following DAG:
> > >
> > >                 RandomEventGenerator -> OuputOperator.
> > >
> > > Both the operators are THREAD_LOCAL.
> > >
> > >   In OutputOperator, throwing exceptions at every committed window. So,
> > > AppMaster supposed to kill container at every committed window. This is
> > > expected behavior.
> > >   But, this is not happening with the current Apex.
> > >
> > >   One more observation is, If the upstream operator throws exception at
> > > every committed window, then AppMaster is killing the container
> > > continuously. But, this is not happening with the downstream operator.
> > >
> > >  Created JIRA for this issue: APEXCORE-357
> > >
> > > Regards,
> > > Chaitanya
> > >
> > > On Thu, Feb 25, 2016 at 12:36 PM, Chaitanya Chebolu <
> > > chaitanya@datatorrent.com> wrote:
> > >
> > > > Hi ,
> > > >
> > > >   I am facing issues in Thread_Local. Two operators which are thread
> > > local
> > > > and out of which, the downstream operator throws exceptions. But,
> > > AppMaster
> > > > is not catching those exceptions. I was unable to figure out why
> > > > application is not working.
> > > >   If both the operators are deployed on different containers, then
> the
> > > > container is killed continuously by AppMaster. This is expected
> > behavior.
> > > >
> > > >    For Example, Let's say the dag be op1 -> op2 where op1, op2 are
> two
> > > > operators which are of them thread local. Throws an exception from
> the
> > > > downstream operator op2, AppMaster is not catching exceptions. I will
> > > > create a JIRA for this issue. Please some one help on this.
> > > >
> > > > Regards,
> > > > Chaitanya
> > > >
> > >
> >
>

Re: Not catching exceptions from down stream operator

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.
Great finding Isha.

In general, it is always advisable to do things in main thread. We had some
timing issues in dtIngest  as we were emitting tuples in the Reconciler
thread. Once we moved all emit statements to the main thread, there were no
issues observed.

Issue: When tuples are emitted in Reconciler thread, some of them were
emitted post endWindow but before the checkpointing is done. These tuples
for the downstream operator are not guaranteed to reach the same window.
Thus checkpointing of the two operators is not in sync and that could
result in few tuples replayed wrongly from the Reconciler based operator.

Regards,
Sandeep

On Wed, Mar 2, 2016 at 8:57 AM, Isha Arkatkar <is...@datatorrent.com> wrote:

> Hi,
>
>   I checked the application  https://github.com/chaithu14/AppThreadLocal
>
>   In this example, exception from downstream operator is thrown in a
> different thread in AbstractReconciler operator. And the rethrow to main
> operator thread is done in handleIdleTime.  This function is not guaranteed
> to be invoked in every window. In Thread_local locality I checked that
> handleIdleTime did not get invoked. So, the exception did not get rethrown.
>
>   The exception thrown from a different thread other than the main operator
> thread are not caught by Application Master. Something we can probably add
> to troubleshooting guide to add a rethrow in the main thread.
>
>   I verified that if downstream operator throws exception in the main
> thread, it is caught appropriately by application master even in thread
> local case.
>
> Thanks,
> Isha
>
> On Thu, Feb 25, 2016 at 9:57 PM, Chaitanya Chebolu <
> chaitanya@datatorrent.com> wrote:
>
> > Hi All,
> >
> >   Created Sample application for THREAD_LOCAL issue. Application is here
> > <https://github.com/chaithu14/AppThreadLocal>.
> >   Application has the following DAG:
> >
> >                 RandomEventGenerator -> OuputOperator.
> >
> > Both the operators are THREAD_LOCAL.
> >
> >   In OutputOperator, throwing exceptions at every committed window. So,
> > AppMaster supposed to kill container at every committed window. This is
> > expected behavior.
> >   But, this is not happening with the current Apex.
> >
> >   One more observation is, If the upstream operator throws exception at
> > every committed window, then AppMaster is killing the container
> > continuously. But, this is not happening with the downstream operator.
> >
> >  Created JIRA for this issue: APEXCORE-357
> >
> > Regards,
> > Chaitanya
> >
> > On Thu, Feb 25, 2016 at 12:36 PM, Chaitanya Chebolu <
> > chaitanya@datatorrent.com> wrote:
> >
> > > Hi ,
> > >
> > >   I am facing issues in Thread_Local. Two operators which are thread
> > local
> > > and out of which, the downstream operator throws exceptions. But,
> > AppMaster
> > > is not catching those exceptions. I was unable to figure out why
> > > application is not working.
> > >   If both the operators are deployed on different containers, then the
> > > container is killed continuously by AppMaster. This is expected
> behavior.
> > >
> > >    For Example, Let's say the dag be op1 -> op2 where op1, op2 are two
> > > operators which are of them thread local. Throws an exception from the
> > > downstream operator op2, AppMaster is not catching exceptions. I will
> > > create a JIRA for this issue. Please some one help on this.
> > >
> > > Regards,
> > > Chaitanya
> > >
> >
>