You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "York, Brennon" <Br...@capitalone.com> on 2015/09/01 01:25:21 UTC

Throttling of `emitTuples`?

Hey all, is there a property out there that throttles the `emitTuples` call for input operators? I’ve been hunting down various properties and can’t seem to find it for the life of me. I’m sure I’m missing something simple…
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Throttling of `emitTuples`?

Posted by Chetan Narsude <ch...@datatorrent.com>.
@Brennon - The input operator driver (part of the platform) will
automatically call emitTuples many times as long as the time taken by all
such calls cumulatively is below 1 second.

As an operator writer you are supposed to just implement the emitTuples to
emit as fast as possible within the given limits. In your case - the logic
would be

beginWindow()
{
   windowLimit = limit;
}

emitTuples()
{
  if (windowLimit-- > 0) {
   emitTuple()
  }
}

IN this case when windowLimit reaches zero, the driver code will see that
no tuples have been emitted and to avoid busy loop it will automatically
insert Thread.sleep(OperatorContext.SPIN_MILLIS) after such a call.

--
Chetan

On Mon, Aug 31, 2015 at 5:27 PM, York, Brennon <Br...@capitalone.com>
wrote:

> Definitely the former for the use case. Not that the downstream operators
> can¹t (eventually) keep up, but we¹re building up a demo application where
> (in prod) we¹ll only receive a set number of events per second. Right now,
> since we aren¹t in prod, we¹re hooking up an operator to read a file from
> HDFS, but it goes *far* beyond what is necessary as far as tuples per
> second is concerned. We figured (since we¹re using the
> AbstractInputOperator) we could throttle the batch size to the amount we
> want per second and then set the `emitTuples` call to only be called once
> per second thus giving us a down-throttled (and guaranteed) events per
> second number to continue testing with. Does that make sense?
>
> @Chetan thanks for the info! It¹s sounding like the reality is that its up
> to us as operator writers to check when `emitTuples` is called and, for
> instance, if its been more than a second, call it again. Is that correct?
>
> On 8/31/15, 5:19 PM, "Vlad Rozov" <v....@datatorrent.com> wrote:
>
> >Do you want to slow down an input operator so downstream operators can
> >keep up with it or in your case input operator puts too much pressure on
> >external data source, so is the need for throttling? I may be wrong, but
> >I think there is no such property and I am curious to see what will be
> >the use case for the property.
> >
> >Thank you,
> >
> >Vlad
> >
> >On 8/31/15 16:25, York, Brennon wrote:
> >> Hey all, is there a property out there that throttles the `emitTuples`
> >>call for input operators? I¹ve been hunting down various properties and
> >>can¹t seem to find it for the life of me. I¹m sure I¹m missing something
> >>simpleŠ
> >> ________________________________________________________
> >>
> >> The information contained in this e-mail is confidential and/or
> >>proprietary to Capital One and/or its affiliates and may only be used
> >>solely in performance of work or services for Capital One. The
> >>information transmitted herewith is intended only for use by the
> >>individual or entity to which it is addressed. If the reader of this
> >>message is not the intended recipient, you are hereby notified that any
> >>review, retransmission, dissemination, distribution, copying or other
> >>use of, or taking of any action in reliance upon this information is
> >>strictly prohibited. If you have received this communication in error,
> >>please contact the sender and delete the material from your computer.
> >>
> >
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>

Re: Throttling of `emitTuples`?

Posted by "York, Brennon" <Br...@capitalone.com>.
Definitely the former for the use case. Not that the downstream operators
can¹t (eventually) keep up, but we¹re building up a demo application where
(in prod) we¹ll only receive a set number of events per second. Right now,
since we aren¹t in prod, we¹re hooking up an operator to read a file from
HDFS, but it goes *far* beyond what is necessary as far as tuples per
second is concerned. We figured (since we¹re using the
AbstractInputOperator) we could throttle the batch size to the amount we
want per second and then set the `emitTuples` call to only be called once
per second thus giving us a down-throttled (and guaranteed) events per
second number to continue testing with. Does that make sense?

@Chetan thanks for the info! It¹s sounding like the reality is that its up
to us as operator writers to check when `emitTuples` is called and, for
instance, if its been more than a second, call it again. Is that correct?

On 8/31/15, 5:19 PM, "Vlad Rozov" <v....@datatorrent.com> wrote:

>Do you want to slow down an input operator so downstream operators can
>keep up with it or in your case input operator puts too much pressure on
>external data source, so is the need for throttling? I may be wrong, but
>I think there is no such property and I am curious to see what will be
>the use case for the property.
>
>Thank you,
>
>Vlad
>
>On 8/31/15 16:25, York, Brennon wrote:
>> Hey all, is there a property out there that throttles the `emitTuples`
>>call for input operators? I¹ve been hunting down various properties and
>>can¹t seem to find it for the life of me. I¹m sure I¹m missing something
>>simpleŠ
>> ________________________________________________________
>>
>> The information contained in this e-mail is confidential and/or
>>proprietary to Capital One and/or its affiliates and may only be used
>>solely in performance of work or services for Capital One. The
>>information transmitted herewith is intended only for use by the
>>individual or entity to which it is addressed. If the reader of this
>>message is not the intended recipient, you are hereby notified that any
>>review, retransmission, dissemination, distribution, copying or other
>>use of, or taking of any action in reliance upon this information is
>>strictly prohibited. If you have received this communication in error,
>>please contact the sender and delete the material from your computer.
>>
>

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.


Re: Throttling of `emitTuples`?

Posted by Vlad Rozov <v....@datatorrent.com>.
Do you want to slow down an input operator so downstream operators can 
keep up with it or in your case input operator puts too much pressure on 
external data source, so is the need for throttling? I may be wrong, but 
I think there is no such property and I am curious to see what will be 
the use case for the property.

Thank you,

Vlad

On 8/31/15 16:25, York, Brennon wrote:
> Hey all, is there a property out there that throttles the `emitTuples` call for input operators? I’ve been hunting down various properties and can’t seem to find it for the life of me. I’m sure I’m missing something simple…
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
>


Re: Throttling of `emitTuples`?

Posted by Munagala Ramanath <ra...@datatorrent.com>.
You can, of course, write your own "ThrottleOperator" with suitable
properties to govern throttling and interpose
it between the over-eager input operator(s) and the rest of your DAG. This
operator could, for example, simply drop
some percentage of tuples if that is acceptable or save them in external
sinks, etc.


On Mon, Aug 31, 2015 at 4:25 PM, York, Brennon <Br...@capitalone.com>
wrote:

> Hey all, is there a property out there that throttles the `emitTuples`
> call for input operators? I’ve been hunting down various properties and
> can’t seem to find it for the life of me. I’m sure I’m missing something
> simple…
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Throttling of `emitTuples`?

Posted by Chetan Narsude <ch...@datatorrent.com>.
Hi Brennon,

  The emitTuples is called as many times as possible within the given
application window. Most efficiency is achieved when a single invocation of
emitTuples takes streaming_window_width time. Often it's not possible for
emitTuples to predict how much time it's going to take and hence the
platform gives it a hint that it can output more events if there is still
time in given streaming window.

  If operator has nothing to emit, it could just return the emitTuples
call. When emitTuples returns without emitting anything, the platform
automatically throttles the calls to emitTuples (to slow down) and
optionally invokes IdleTimeHandler.

  So it's left upto operator to throttle its own behavior. Generally most
operators will output events in batches of a few thousand at a time or try
to empty up their queue if it's being built asynchronously.

--
Chetan

On Mon, Aug 31, 2015 at 4:25 PM, York, Brennon <Br...@capitalone.com>
wrote:

> Hey all, is there a property out there that throttles the `emitTuples`
> call for input operators? I’ve been hunting down various properties and
> can’t seem to find it for the life of me. I’m sure I’m missing something
> simple…
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>