You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Sunil Parmar <sp...@threatmetrix.com> on 2017/03/01 17:44:55 UTC

Re: Blocked operator PTOperator

I think we figured the issue. It was the Cassandra ; in that environment one of the node was making write super slow. We fixed the cluster and now it's much better.

On 2017-02-28 13:09 (-0800), Sandesh Hegde <sa...@datatorrent.com>> wrote:
> Can you please attach the stacktrace of the operator?
>
> You can increase the attribute TIMEOUT_WINDOW_COUNT , AppMaster uses that
> to decide when to kill the blocked operator.
>
> For taking stack trace, find the information in the blog.
> https://www.datatorrent.com/blog/getting-stack-traces-apache-apex-applications/
>
> On Tue, Feb 28, 2017 at 12:59 PM Sunil Parmar <sp...@threatmetrix.com>>
> wrote:
>
> > Ashwin,
> > I don%u2019t see such warning. I%u2019ll PM you entire log file.
> >
> > On 2017-02-28 12:16 (-0800), Ashwin Chandra Putta <
> > ashwinchandrap@gmail.com<ma...@gmail.com>> wrote:
> > > Sunil,
> > > This might be related to checkpointing. See:
> > >
> > https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2211-L2217
> > >
> > > Also check this piece of code:
> > >
> > https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2031-L2044
> > >
> > > Can you paste the output of the warning from the code above which starts
> > > with 'Marking operator '
> > >
> > > Regards,
> > > Ashwin.
> > >
> > > On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar <sp...@threatmetrix.com>
> > >
> > > wrote:
> > >
> > > > That doesn%u2019t seems to be the case. We do see window id moving in
> > UI as
> > > > well.
> > > >
> > > > On 2017-02-28 11:19 (-0800), Munagala Ramanath <ra...@datatorrent.com>>
> > > > wrote:
> > > > > It likely means that that operator is taking too long to return from
> > one
> > > > of
> > > > > the callbacks like beginWindow(), endWindow(),
> > > > > emitTuples(), etc. Do you have any potentially blocking calls to
> > external
> > > > > systems in any of those callbacks ?
> > > > >
> > > > > Ram
> > > > >
> > > > > On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar <
> > sparmar@threatmetrix.com<ma...@threatmetrix.com>
> > > > >
> > > > > wrote:
> > > > >
> > > > > > 2017-02-27 19:43:21,926 INFO com.datatorrent.stram.
> > > > StreamingContainerManager:
> > > > > > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter]
> > container
> > > > > >
> > PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> > > > > > time 61905ms
> > > > > > 2017-02-27 19:43:22,928 INFO com.datatorrent.stram.
> > > > StreamingAppMasterService:
> > > > > > Completed containerId=container_1487310232732_0027_02_000111,
> > > > > > state=COMPLETE, exitStatus=-105, diagnostics=Container killed by
> > the
> > > > > > ApplicationMaster.
> > > > > > Container killed on request. Exit code is 143
> > > > > > Container exited with a non-zero exit code 143
> > > > > >
> > > > > >
> > > > > > Can anyone help understand this error ? We see one of the operators
> > > > keeps
> > > > > > restarting the container; the above error is from AppMaster log.
> > > > > >
> > > > > > Thanks,
> > > > > > Sunil
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > _______________________________________________________
> > > > >
> > > > > Munagala V. Ramanath
> > > > >
> > > > > Software Engineer
> > > > >
> > > > > E: ram@datatorrent.com<ma...@datatorrent.com> | M: (408) 331-5034 | Twitter: @UnknownRam
> > > > >
> > > > > www.datatorrent.com  |  apex.apache.org
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
> --
> *Join us at Apex Big Data World-San Jose
> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> [image: http://www.apexbigdata.com/san-jose-register.html]
>

Re: Blocked operator PTOperator

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Just curious, was that write happening in one of the operator callbacks ?

Ram

On Wed, Mar 1, 2017 at 9:44 AM, Sunil Parmar <sp...@threatmetrix.com>
wrote:

> I think we figured the issue. It was the Cassandra ; in that environment
> one of the node was making write super slow. We fixed the cluster and now
> it’s much better.
>
> On 2017-02-28 13:09 (-0800), Sandesh Hegde <sa...@datatorrent.com>
> wrote:
> > Can you please attach the stacktrace of the operator?
> >
> > You can increase the attribute TIMEOUT_WINDOW_COUNT , AppMaster uses that
> > to decide when to kill the blocked operator.
> >
> > For taking stack trace, find the information in the blog.
> > https://www.datatorrent.com/blog/getting-stack-traces-
> apache-apex-applications/
> >
> > On Tue, Feb 28, 2017 at 12:59 PM Sunil Parmar <sp...@threatmetrix.com>
> > wrote:
> >
> > > Ashwin,
> > > I don%u2019t see such warning. I%u2019ll PM you entire log file.
> > >
> > > On 2017-02-28 12:16 (-0800), Ashwin Chandra Putta <
> > > ashwinchandrap@gmail.com> wrote:
> > > > Sunil,
> > > > This might be related to checkpointing. See:
> > > >
> > > https://github.com/apache/apex-core/blob/master/engine/
> src/main/java/com/datatorrent/stram/StreamingContainerManager.
> java#L2211-L2217
> > > >
> > > > Also check this piece of code:
> > > >
> > > https://github.com/apache/apex-core/blob/master/engine/
> src/main/java/com/datatorrent/stram/StreamingContainerManager.
> java#L2031-L2044
> > > >
> > > > Can you paste the output of the warning from the code above which
> starts
> > > > with 'Marking operator '
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > > > On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar <
> sparmar@threatmetrix.com
> > > >
> > > > wrote:
> > > >
> > > > > That doesn%u2019t seems to be the case. We do see window id moving
> in
> > > UI as
> > > > > well.
> > > > >
> > > > > On 2017-02-28 11:19 (-0800), Munagala Ramanath <
> ram@datatorrent.com>
> > > > > wrote:
> > > > > > It likely means that that operator is taking too long to return
> from
> > > one
> > > > > of
> > > > > > the callbacks like beginWindow(), endWindow(),
> > > > > > emitTuples(), etc. Do you have any potentially blocking calls to
> > > external
> > > > > > systems in any of those callbacks ?
> > > > > >
> > > > > > Ram
> > > > > >
> > > > > > On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar <
> > > sparmar@threatmetrix.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > 2017-02-27 19:43:21,926 INFO com.datatorrent.stram.
> > > > > StreamingContainerManager:
> > > > > > > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter]
> > > container
> > > > > > >
> > > PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> > > > > > > time 61905ms
> > > > > > > 2017-02-27 19:43:22,928 INFO com.datatorrent.stram.
> > > > > StreamingAppMasterService:
> > > > > > > Completed containerId=container_1487310232732_0027_02_000111,
> > > > > > > state=COMPLETE, exitStatus=-105, diagnostics=Container killed
> by
> > > the
> > > > > > > ApplicationMaster.
> > > > > > > Container killed on request. Exit code is 143
> > > > > > > Container exited with a non-zero exit code 143
> > > > > > >
> > > > > > >
> > > > > > > Can anyone help understand this error ? We see one of the
> operators
> > > > > keeps
> > > > > > > restarting the container; the above error is from AppMaster
> log.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Sunil
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > _______________________________________________________
> > > > > >
> > > > > > Munagala V. Ramanath
> > > > > >
> > > > > > Software Engineer
> > > > > >
> > > > > > E: ram@datatorrent.com | M: (408) 331-5034 | Twitter:
> @UnknownRam
> > > > > >
> > > > > > www.datatorrent.com  |  apex.apache.org
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > >
> > --
> > *Join us at Apex Big Data World-San Jose
> > <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> > [image: http://www.apexbigdata.com/san-jose-register.html]
> >
>



-- 

_______________________________________________________

Munagala V. Ramanath

Software Engineer

E: ram@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org