You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Priyanka Gugale <pr...@datatorrent.com> on 2016/04/05 07:47:32 UTC

Re: Bandwidth control for Input operators in Apex

Okay so I will open the pull request soon.

-Priyanka

On Wed, Mar 23, 2016 at 1:35 PM, Yogi Devendra <devendra.vyavahare@gmail.com
> wrote:

> This looks OK. Let us build it incrementally.
>
> ~ Yogi
>
> On 23 March 2016 at 13:24, Sandeep Deshmukh <sa...@datatorrent.com>
> wrote:
>
> > I would suggest that we go ahead with design as suggested by Priyanka
> where
> > we have bandwidth setup for each operator separately. We can later extend
> > this for bandwidth to be shared with different input operators or for the
> > DAG as a whole.
> >
> > Regards,
> > Sandeep
> >
> > On Wed, Mar 23, 2016 at 11:51 AM, Priyanka Gugale <
> > priyanka@datatorrent.com>
> > wrote:
> >
> > > Right now it's not for output operator, but one can very well use
> > bandwidth
> > > manager to keep track of bandwidth usage and limit your output speed.
> The
> > > bigger challenge there would be, you won't be able to process window
> data
> > > sent by upstream operator in same window. For that you need to do more
> > than
> > > just bandwidth control.
> > > So I would say, bandwidth control feature can be used as it is for
> output
> > > operator as well, only we need to do more than just bandwidth
> limitation
> > > for output operators.
> > >
> > > -Priyanka
> > >
> > > On Wed, Mar 23, 2016 at 11:47 AM, Priyanka Gugale <
> > > priyanka@datatorrent.com>
> > > wrote:
> > >
> > > > That's a good question Chaitanya, Right now the bandwidth control is
> at
> > > > Input Operator level and not application level. So if you have two
> > input
> > > > operator you need to set bandwidth on both separately by this design.
> > > > May be it would be good to have bandwidth control at Application
> level
> > > > than operator level. Let me think if I can modify the design to do
> > that.
> > > If
> > > > you have any ideas for same, please share them.
> > > >
> > > > -Priyanka
> > > >
> > > > On Wed, Mar 23, 2016 at 11:47 AM, Yogi Devendra <
> > > > devendra.vyavahare@gmail.com> wrote:
> > > >
> > > >> Priyanka,
> > > >>
> > > >> From the design description it is not clear how it will be used to
> > > control
> > > >> output bandwidth (point #2,3,4 mentioned by Sandeep)
> > > >>
> > > >> ~ Yogi
> > > >>
> > > >> On 23 March 2016 at 11:39, Chaitanya Chebolu <
> > chaitanya@datatorrent.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > This is very useful feature.
> > > >> > I would like to know, how you are distributing the bandwidth for
> the
> > > >> below
> > > >> > situation:
> > > >> > - Two input operators say i1 and i2 are deployed on same node and
> > both
> > > >> the
> > > >> > operators have bandwidthManager as plugin.
> > > >> >
> > > >> > On Fri, Mar 18, 2016 at 5:43 PM, Priyanka Gugale <
> > > >> priyanka@datatorrent.com
> > > >> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > Thanks for inputs Sandeep, would take care of those points.
> > > >> > >
> > > >> > > Here is high level design we are considering, We would have
> > > following
> > > >> > > components:
> > > >> > > *1.* *BandwidthManager*
> > > >> > > This keeps track of current bandwidth usage of system and takes
> > > >> decision
> > > >> > if
> > > >> > > requested data bandwidth can be used right away or not. To do
> this
> > > it
> > > >> > > used Leaky
> > > >> > > bucket <https://en.wikipedia.org/wiki/Leaky_bucket> algorithm
> > where
> > > >> it
> > > >> > > emits data as long as it has not overused bandwidth (i.e.
> > bandwidth
> > > >> > > consumption is >=0) and then wait to accumulate bandwidth for a
> > > while
> > > >> > (till
> > > >> > > bandwidth goes from -ve value to +ve).
> > > >> > >
> > > >> > > *2. BandwidthLimitingInputOperator*
> > > >> > > Any Input operator which want to implement bandwidth restriction
> > > >> should
> > > >> > > implement BandwidthLimitingInputOperator. The operator have
> > abstract
> > > >> > method
> > > >> > >  to initialize instance of BandwidthManager and a method to emit
> > > tuple
> > > >> > with
> > > >> > > bandwidth restriction to emit tuples as per available bandwidth.
> > > >> > >
> > > >> > > *3. BandwidthPartitioner*
> > > >> > > Bandwidth partitioner is introduced for static partitioning. If
> > > static
> > > >> > > partitioning is used by default StatelessPartitioner class is
> > > >> > initialized.
> > > >> > > With bandwidth restriction we want to equally divide bandwidth
> > > amongst
> > > >> > > available partitions. BandwidthPartitioner should take care of
> it.
> > > It
> > > >> > > extends StatelessPartitioner, it just sets right bandwidth on
> all
> > > >> > > partitions after StatelessPartitioner creates/deletes
> partitiolns.
> > > In
> > > >> > case
> > > >> > > of dynamic partitioning the operator implementing
> > definePartitions,
> > > >> > should
> > > >> > > take care of bandwidth distribution.
> > > >> > >
> > > >> > > This design takes care of basic bandwidth restriction, also
> takes
> > > >> care of
> > > >> > > partitions by equally distributing available bandwidth among all
> > > >> > > partitions. Also this is open enough to do further modifications
> > to
> > > >> take
> > > >> > > care of complex situations.
> > > >> > >
> > > >> > > Let me know your opinion on what else we can do to design it
> > better.
> > > >> > >
> > > >> > > -Priyanka
> > > >> > >
> > > >> > > On Thu, Mar 3, 2016 at 10:11 AM, Sandeep Deshmukh <
> > > >> > sandeep@datatorrent.com
> > > >> > > >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > The main purpose is not to handle back pressure but to limit
> > > >> bandwidth
> > > >> > > > usage by applications. This is useful in ingestion use cases.
> > > >> Typically
> > > >> > > > user needs to ingest say up to  1GB per sec and not more. The
> > > tuple
> > > >> > size
> > > >> > > > may vary based on messages based tuples (few KBs) or block
> > tuples
> > > >> for
> > > >> > > files
> > > >> > > > (few MBs). Bandwidth manager will take max bandwidth that can
> be
> > > >> > utilized
> > > >> > > > by the application and will take care of sharing that across
> > > >> partitions
> > > >> > > > etc.
> > > >> > > >
> > > >> > > > Priyanka: You could also consider following in your design
> > > >> > > >
> > > >> > > >    1. Limiting input rate (across partitions)
> > > >> > > >    2. Limiting output rate (across partitions)
> > > >> > > >    3. Specifying total bandwidth that the Application can
> > utilize
> > > >> > > including
> > > >> > > >    input and output? Not sure if this is required. Need
> comments
> > > >> from
> > > >> > > > others
> > > >> > > >    here.
> > > >> > > >    4. Include default implementation that will handle 1 and 2,
> > and
> > > >> > anyone
> > > >> > > >    interested in having their own Bandwidth manager should be
> > able
> > > >> to
> > > >> > > > extend
> > > >> > > >    the default one.
> > > >> > > >    5. Can you also look at including/extending tuples per sec
> as
> > > >> > pointed
> > > >> > > >    out by Tim/Chinmay.
> > > >> > > >
> > > >> > > > Regards,
> > > >> > > > Sandeep
> > > >> > > >
> > > >> > > > On Thu, Mar 3, 2016 at 12:23 AM, Timothy Farkas <
> > > >> tim@datatorrent.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Not sure if this is helpful, but there is already a utility
> in
> > > >> Malhar
> > > >> > > for
> > > >> > > > > converting tuples per second to tuples per window. This
> allows
> > > the
> > > >> > user
> > > >> > > > to
> > > >> > > > > define a property in tuples per second, then the operator
> can
> > > >> convert
> > > >> > > > that
> > > >> > > > > to tuples per window so it emits the correct number of
> tuples
> > > per
> > > >> > > window.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/util/time/WindowUtils.java
> > > >> > > > >
> > > >> > > > > On Wed, Mar 2, 2016 at 10:41 AM, Chinmay Kolhatkar <
> > > >> > > > > chinmay@datatorrent.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi Priyanka,
> > > >> > > > > >
> > > >> > > > > > Indeed this is a useful feature.
> > > >> > > > > >
> > > >> > > > > > I believe number bytes consumed per sec can as well
> > translate
> > > to
> > > >> > > number
> > > >> > > > > of
> > > >> > > > > > tuples consumed per sec.
> > > >> > > > > >
> > > >> > > > > > If above is correct, won't back pressure that is handled
> by
> > > >> > > > bufferserver
> > > >> > > > > > help in your use case?
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Chinmay.
> > > >> > > > > > On 2 Mar 2016 4:49 p.m., "Priyanka Gugale" <
> > > >> > priyanka@datatorrent.com
> > > >> > > >
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Many times we need to put bandwidth restrictions or put
> > some
> > > >> > limit
> > > >> > > on
> > > >> > > > > > input
> > > >> > > > > > > operator for number of bytes to be consumed per second.
> > As I
> > > >> > > > understand
> > > >> > > > > > in
> > > >> > > > > > > Apex there is no direct support for this feature.
> > > >> > > > > > >
> > > >> > > > > > > I am planning to write a bandwidth manager which will
> help
> > > in
> > > >> > > > limiting
> > > >> > > > > > > bandwidth at Input operator. Let me know if there are
> any
> > > >> better
> > > >> > > > > > > alternative ways. I will soon publish design for
> Bandwidth
> > > >> > Manager
> > > >> > > I
> > > >> > > > am
> > > >> > > > > > > planning to write.
> > > >> > > > > > >
> > > >> > > > > > > -Priyanka
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Bandwidth control for Input operators in Apex

Posted by Priyanka Gugale <pr...@datatorrent.com>.
Hi,

Would anyone like to review this pull request:
https://github.com/apache/incubator-apex-malhar/pull/279
This is based on the design discussed in previous mails. I am open to
discuss it again if anyone is interested.

-Priyanka

On Mon, Apr 4, 2016 at 10:47 PM, Priyanka Gugale <pr...@datatorrent.com>
wrote:

> Okay so I will open the pull request soon.
>
> -Priyanka
>
> On Wed, Mar 23, 2016 at 1:35 PM, Yogi Devendra <
> devendra.vyavahare@gmail.com> wrote:
>
>> This looks OK. Let us build it incrementally.
>>
>> ~ Yogi
>>
>> On 23 March 2016 at 13:24, Sandeep Deshmukh <sa...@datatorrent.com>
>> wrote:
>>
>> > I would suggest that we go ahead with design as suggested by Priyanka
>> where
>> > we have bandwidth setup for each operator separately. We can later
>> extend
>> > this for bandwidth to be shared with different input operators or for
>> the
>> > DAG as a whole.
>> >
>> > Regards,
>> > Sandeep
>> >
>> > On Wed, Mar 23, 2016 at 11:51 AM, Priyanka Gugale <
>> > priyanka@datatorrent.com>
>> > wrote:
>> >
>> > > Right now it's not for output operator, but one can very well use
>> > bandwidth
>> > > manager to keep track of bandwidth usage and limit your output speed.
>> The
>> > > bigger challenge there would be, you won't be able to process window
>> data
>> > > sent by upstream operator in same window. For that you need to do more
>> > than
>> > > just bandwidth control.
>> > > So I would say, bandwidth control feature can be used as it is for
>> output
>> > > operator as well, only we need to do more than just bandwidth
>> limitation
>> > > for output operators.
>> > >
>> > > -Priyanka
>> > >
>> > > On Wed, Mar 23, 2016 at 11:47 AM, Priyanka Gugale <
>> > > priyanka@datatorrent.com>
>> > > wrote:
>> > >
>> > > > That's a good question Chaitanya, Right now the bandwidth control
>> is at
>> > > > Input Operator level and not application level. So if you have two
>> > input
>> > > > operator you need to set bandwidth on both separately by this
>> design.
>> > > > May be it would be good to have bandwidth control at Application
>> level
>> > > > than operator level. Let me think if I can modify the design to do
>> > that.
>> > > If
>> > > > you have any ideas for same, please share them.
>> > > >
>> > > > -Priyanka
>> > > >
>> > > > On Wed, Mar 23, 2016 at 11:47 AM, Yogi Devendra <
>> > > > devendra.vyavahare@gmail.com> wrote:
>> > > >
>> > > >> Priyanka,
>> > > >>
>> > > >> From the design description it is not clear how it will be used to
>> > > control
>> > > >> output bandwidth (point #2,3,4 mentioned by Sandeep)
>> > > >>
>> > > >> ~ Yogi
>> > > >>
>> > > >> On 23 March 2016 at 11:39, Chaitanya Chebolu <
>> > chaitanya@datatorrent.com
>> > > >
>> > > >> wrote:
>> > > >>
>> > > >> > This is very useful feature.
>> > > >> > I would like to know, how you are distributing the bandwidth for
>> the
>> > > >> below
>> > > >> > situation:
>> > > >> > - Two input operators say i1 and i2 are deployed on same node and
>> > both
>> > > >> the
>> > > >> > operators have bandwidthManager as plugin.
>> > > >> >
>> > > >> > On Fri, Mar 18, 2016 at 5:43 PM, Priyanka Gugale <
>> > > >> priyanka@datatorrent.com
>> > > >> > >
>> > > >> > wrote:
>> > > >> >
>> > > >> > > Hi,
>> > > >> > >
>> > > >> > > Thanks for inputs Sandeep, would take care of those points.
>> > > >> > >
>> > > >> > > Here is high level design we are considering, We would have
>> > > following
>> > > >> > > components:
>> > > >> > > *1.* *BandwidthManager*
>> > > >> > > This keeps track of current bandwidth usage of system and takes
>> > > >> decision
>> > > >> > if
>> > > >> > > requested data bandwidth can be used right away or not. To do
>> this
>> > > it
>> > > >> > > used Leaky
>> > > >> > > bucket <https://en.wikipedia.org/wiki/Leaky_bucket> algorithm
>> > where
>> > > >> it
>> > > >> > > emits data as long as it has not overused bandwidth (i.e.
>> > bandwidth
>> > > >> > > consumption is >=0) and then wait to accumulate bandwidth for a
>> > > while
>> > > >> > (till
>> > > >> > > bandwidth goes from -ve value to +ve).
>> > > >> > >
>> > > >> > > *2. BandwidthLimitingInputOperator*
>> > > >> > > Any Input operator which want to implement bandwidth
>> restriction
>> > > >> should
>> > > >> > > implement BandwidthLimitingInputOperator. The operator have
>> > abstract
>> > > >> > method
>> > > >> > >  to initialize instance of BandwidthManager and a method to
>> emit
>> > > tuple
>> > > >> > with
>> > > >> > > bandwidth restriction to emit tuples as per available
>> bandwidth.
>> > > >> > >
>> > > >> > > *3. BandwidthPartitioner*
>> > > >> > > Bandwidth partitioner is introduced for static partitioning. If
>> > > static
>> > > >> > > partitioning is used by default StatelessPartitioner class is
>> > > >> > initialized.
>> > > >> > > With bandwidth restriction we want to equally divide bandwidth
>> > > amongst
>> > > >> > > available partitions. BandwidthPartitioner should take care of
>> it.
>> > > It
>> > > >> > > extends StatelessPartitioner, it just sets right bandwidth on
>> all
>> > > >> > > partitions after StatelessPartitioner creates/deletes
>> partitiolns.
>> > > In
>> > > >> > case
>> > > >> > > of dynamic partitioning the operator implementing
>> > definePartitions,
>> > > >> > should
>> > > >> > > take care of bandwidth distribution.
>> > > >> > >
>> > > >> > > This design takes care of basic bandwidth restriction, also
>> takes
>> > > >> care of
>> > > >> > > partitions by equally distributing available bandwidth among
>> all
>> > > >> > > partitions. Also this is open enough to do further
>> modifications
>> > to
>> > > >> take
>> > > >> > > care of complex situations.
>> > > >> > >
>> > > >> > > Let me know your opinion on what else we can do to design it
>> > better.
>> > > >> > >
>> > > >> > > -Priyanka
>> > > >> > >
>> > > >> > > On Thu, Mar 3, 2016 at 10:11 AM, Sandeep Deshmukh <
>> > > >> > sandeep@datatorrent.com
>> > > >> > > >
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > > > The main purpose is not to handle back pressure but to limit
>> > > >> bandwidth
>> > > >> > > > usage by applications. This is useful in ingestion use cases.
>> > > >> Typically
>> > > >> > > > user needs to ingest say up to  1GB per sec and not more. The
>> > > tuple
>> > > >> > size
>> > > >> > > > may vary based on messages based tuples (few KBs) or block
>> > tuples
>> > > >> for
>> > > >> > > files
>> > > >> > > > (few MBs). Bandwidth manager will take max bandwidth that
>> can be
>> > > >> > utilized
>> > > >> > > > by the application and will take care of sharing that across
>> > > >> partitions
>> > > >> > > > etc.
>> > > >> > > >
>> > > >> > > > Priyanka: You could also consider following in your design
>> > > >> > > >
>> > > >> > > >    1. Limiting input rate (across partitions)
>> > > >> > > >    2. Limiting output rate (across partitions)
>> > > >> > > >    3. Specifying total bandwidth that the Application can
>> > utilize
>> > > >> > > including
>> > > >> > > >    input and output? Not sure if this is required. Need
>> comments
>> > > >> from
>> > > >> > > > others
>> > > >> > > >    here.
>> > > >> > > >    4. Include default implementation that will handle 1 and
>> 2,
>> > and
>> > > >> > anyone
>> > > >> > > >    interested in having their own Bandwidth manager should be
>> > able
>> > > >> to
>> > > >> > > > extend
>> > > >> > > >    the default one.
>> > > >> > > >    5. Can you also look at including/extending tuples per
>> sec as
>> > > >> > pointed
>> > > >> > > >    out by Tim/Chinmay.
>> > > >> > > >
>> > > >> > > > Regards,
>> > > >> > > > Sandeep
>> > > >> > > >
>> > > >> > > > On Thu, Mar 3, 2016 at 12:23 AM, Timothy Farkas <
>> > > >> tim@datatorrent.com>
>> > > >> > > > wrote:
>> > > >> > > >
>> > > >> > > > > Not sure if this is helpful, but there is already a
>> utility in
>> > > >> Malhar
>> > > >> > > for
>> > > >> > > > > converting tuples per second to tuples per window. This
>> allows
>> > > the
>> > > >> > user
>> > > >> > > > to
>> > > >> > > > > define a property in tuples per second, then the operator
>> can
>> > > >> convert
>> > > >> > > > that
>> > > >> > > > > to tuples per window so it emits the correct number of
>> tuples
>> > > per
>> > > >> > > window.
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/util/time/WindowUtils.java
>> > > >> > > > >
>> > > >> > > > > On Wed, Mar 2, 2016 at 10:41 AM, Chinmay Kolhatkar <
>> > > >> > > > > chinmay@datatorrent.com>
>> > > >> > > > > wrote:
>> > > >> > > > >
>> > > >> > > > > > Hi Priyanka,
>> > > >> > > > > >
>> > > >> > > > > > Indeed this is a useful feature.
>> > > >> > > > > >
>> > > >> > > > > > I believe number bytes consumed per sec can as well
>> > translate
>> > > to
>> > > >> > > number
>> > > >> > > > > of
>> > > >> > > > > > tuples consumed per sec.
>> > > >> > > > > >
>> > > >> > > > > > If above is correct, won't back pressure that is handled
>> by
>> > > >> > > > bufferserver
>> > > >> > > > > > help in your use case?
>> > > >> > > > > >
>> > > >> > > > > > Thanks,
>> > > >> > > > > > Chinmay.
>> > > >> > > > > > On 2 Mar 2016 4:49 p.m., "Priyanka Gugale" <
>> > > >> > priyanka@datatorrent.com
>> > > >> > > >
>> > > >> > > > > > wrote:
>> > > >> > > > > >
>> > > >> > > > > > > Many times we need to put bandwidth restrictions or put
>> > some
>> > > >> > limit
>> > > >> > > on
>> > > >> > > > > > input
>> > > >> > > > > > > operator for number of bytes to be consumed per second.
>> > As I
>> > > >> > > > understand
>> > > >> > > > > > in
>> > > >> > > > > > > Apex there is no direct support for this feature.
>> > > >> > > > > > >
>> > > >> > > > > > > I am planning to write a bandwidth manager which will
>> help
>> > > in
>> > > >> > > > limiting
>> > > >> > > > > > > bandwidth at Input operator. Let me know if there are
>> any
>> > > >> better
>> > > >> > > > > > > alternative ways. I will soon publish design for
>> Bandwidth
>> > > >> > Manager
>> > > >> > > I
>> > > >> > > > am
>> > > >> > > > > > > planning to write.
>> > > >> > > > > > >
>> > > >> > > > > > > -Priyanka
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
>