You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Gyula Fóra <gy...@gmail.com> on 2014/08/13 15:31:53 UTC

Transfer latency (inputs vs output buffers)

Hey All,

We started measuring the latency we can provide with our streaming
architecture and we stumbled upon some interesting measurements.

It seems that we can control the output buffers well and if we just
generate a sequence of numbers the outputs get flushed well under 0.5 ms.
This would be fine and also what we expected.

The problem is that no matter how fast the the outputs flush there is a
huge latency generated at the receiving task (about 250ms). We suspect that
the somehow the input buffer must have much bigger size than the output
buffer.

Whats even more interseting that if we dont use a Task vertex only a source
and a sink, we dont experience the same issue and the whole latency is
about 0.6 ms. Adding a simple forwarding task between the source and the
sink makes this 250 ms, and the latency is generated somewhere between the
Source and the Map. (From the map to the sink its fast again).

Does anyone know why could this happen and how can we solve the issue ?

Regards,
Gyula

Re: Transfer latency (inputs vs output buffers)

Posted by Márton Balassi <ba...@gmail.com>.
Update based on a skype call with Stefan:

The measurements were run on a local machine and even there we experienced
this latency issue.

Stefan highlighted that the input buffer mechanism for the tasks and sinks
are in fact the same making the issue even more interesting.

I haven't found any promising since then.


On Wed, Aug 13, 2014 at 4:58 PM, Gyula Fóra <gy...@gmail.com> wrote:

> No this is not an issue only after the first flush (at the first flush
> there is an even greater latency but thats because of what you said, we
> knew that), it stays for the whole runtime.
>
>
> On Wed, Aug 13, 2014 at 4:55 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > Hi!
> >
> > Is this for the first envelopes, or consistency for all of them?
> >
> > The first ones lazily bring up the successive tasks, which may explain
> the
> > latency.
> >
> > Stephan
> >
> >
> >
> > On Wed, Aug 13, 2014 at 3:31 PM, Gyula Fóra <gy...@gmail.com>
> wrote:
> >
> > > Hey All,
> > >
> > > We started measuring the latency we can provide with our streaming
> > > architecture and we stumbled upon some interesting measurements.
> > >
> > > It seems that we can control the output buffers well and if we just
> > > generate a sequence of numbers the outputs get flushed well under 0.5
> ms.
> > > This would be fine and also what we expected.
> > >
> > > The problem is that no matter how fast the the outputs flush there is a
> > > huge latency generated at the receiving task (about 250ms). We suspect
> > that
> > > the somehow the input buffer must have much bigger size than the output
> > > buffer.
> > >
> > > Whats even more interseting that if we dont use a Task vertex only a
> > source
> > > and a sink, we dont experience the same issue and the whole latency is
> > > about 0.6 ms. Adding a simple forwarding task between the source and
> the
> > > sink makes this 250 ms, and the latency is generated somewhere between
> > the
> > > Source and the Map. (From the map to the sink its fast again).
> > >
> > > Does anyone know why could this happen and how can we solve the issue ?
> > >
> > > Regards,
> > > Gyula
> > >
> >
>

Re: Transfer latency (inputs vs output buffers)

Posted by Gyula Fóra <gy...@gmail.com>.
No this is not an issue only after the first flush (at the first flush
there is an even greater latency but thats because of what you said, we
knew that), it stays for the whole runtime.


On Wed, Aug 13, 2014 at 4:55 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> Is this for the first envelopes, or consistency for all of them?
>
> The first ones lazily bring up the successive tasks, which may explain the
> latency.
>
> Stephan
>
>
>
> On Wed, Aug 13, 2014 at 3:31 PM, Gyula Fóra <gy...@gmail.com> wrote:
>
> > Hey All,
> >
> > We started measuring the latency we can provide with our streaming
> > architecture and we stumbled upon some interesting measurements.
> >
> > It seems that we can control the output buffers well and if we just
> > generate a sequence of numbers the outputs get flushed well under 0.5 ms.
> > This would be fine and also what we expected.
> >
> > The problem is that no matter how fast the the outputs flush there is a
> > huge latency generated at the receiving task (about 250ms). We suspect
> that
> > the somehow the input buffer must have much bigger size than the output
> > buffer.
> >
> > Whats even more interseting that if we dont use a Task vertex only a
> source
> > and a sink, we dont experience the same issue and the whole latency is
> > about 0.6 ms. Adding a simple forwarding task between the source and the
> > sink makes this 250 ms, and the latency is generated somewhere between
> the
> > Source and the Map. (From the map to the sink its fast again).
> >
> > Does anyone know why could this happen and how can we solve the issue ?
> >
> > Regards,
> > Gyula
> >
>

Re: Transfer latency (inputs vs output buffers)

Posted by Stephan Ewen <se...@apache.org>.
Hi!

Is this for the first envelopes, or consistency for all of them?

The first ones lazily bring up the successive tasks, which may explain the
latency.

Stephan



On Wed, Aug 13, 2014 at 3:31 PM, Gyula Fóra <gy...@gmail.com> wrote:

> Hey All,
>
> We started measuring the latency we can provide with our streaming
> architecture and we stumbled upon some interesting measurements.
>
> It seems that we can control the output buffers well and if we just
> generate a sequence of numbers the outputs get flushed well under 0.5 ms.
> This would be fine and also what we expected.
>
> The problem is that no matter how fast the the outputs flush there is a
> huge latency generated at the receiving task (about 250ms). We suspect that
> the somehow the input buffer must have much bigger size than the output
> buffer.
>
> Whats even more interseting that if we dont use a Task vertex only a source
> and a sink, we dont experience the same issue and the whole latency is
> about 0.6 ms. Adding a simple forwarding task between the source and the
> sink makes this 250 ms, and the latency is generated somewhere between the
> Source and the Map. (From the map to the sink its fast again).
>
> Does anyone know why could this happen and how can we solve the issue ?
>
> Regards,
> Gyula
>