You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "m@xi" <ma...@gmail.com> on 2018/02/15 14:13:32 UTC

Manipulating Processing elements of Network Buffers

Hello Flinker!

I know that one should set appropriately the number of Network Buffers (NB)
that its Flink deployment will use. Except from that, I am wondering if one
might change/manipulate the specific sequence of data records into the NB in
order to optimize the performance of its application.

For instance, lets assume that a NB has now 3 elements {a,b,c} in this
specific order. The data is going be shipped to a taskmanager(s) for further
processing etc etc. But maybe if the aforementioned elements where to be
shipped in another order, e.g. {b,c,a} then a specific task would run
faster.

Is there any such way to manipulate the ordering in the NB or the ordering
of the arrival of tuples at the input of an operator???

Thanks in advance.

Best,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Manipulating Processing elements of Network Buffers

Posted by "m@xi" <ma...@gmail.com>.
Hi Till!

Thanks a lot for your useful reply. 

So now I get it. I should not manipulate or disturb the network buffer
contents, as this will trigger other problematic behaviours. On the other
hand, the price of buffering the data in my operator first and e.g. sorting
them first based on some criterion, and then processing them...what is the
its impact to the efficiency/effectiveness of a streaming algorithm.

I mean, Flink is "pure" streaming, but not-so-pure due to the network
buffers, so if I use another buffering at site in each operator, this will
make my application slower and also this is not streaming, this becomes
batch.

Thanks in advance.

Best,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Manipulating Processing elements of Network Buffers

Posted by Till Rohrmann <tr...@apache.org>.
Hi Max,

the network buffer order is quite important at the moment, because the
network stream does not only transport data but also control events such as
the checkpoint barriers. In order to guarantee that you don't lose data in
case of a failure it is (at the moment) strictly necessary that checkpoint
barriers don't overtake data records for example. Moreover, records might
span multiple memory buffers if they are large. Therefore, it might not be
all that useful to do this ordering on the network buffer level.

Instead, what you can always do is to sort elements in your user function.
The price you have to pay for this is that you have to buffer elements in
between and also checkpoint them.

Cheers,
Till

On Thu, Feb 15, 2018 at 3:13 PM, m@xi <ma...@gmail.com> wrote:

> Hello Flinker!
>
> I know that one should set appropriately the number of Network Buffers (NB)
> that its Flink deployment will use. Except from that, I am wondering if one
> might change/manipulate the specific sequence of data records into the NB
> in
> order to optimize the performance of its application.
>
> For instance, lets assume that a NB has now 3 elements {a,b,c} in this
> specific order. The data is going be shipped to a taskmanager(s) for
> further
> processing etc etc. But maybe if the aforementioned elements where to be
> shipped in another order, e.g. {b,c,a} then a specific task would run
> faster.
>
> Is there any such way to manipulate the ordering in the NB or the ordering
> of the arrival of tuples at the input of an operator???
>
> Thanks in advance.
>
> Best,
> Max
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>