You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Hao Sun <ha...@zendesk.com> on 2017/10/11 17:58:04 UTC

R/W traffic estimation between Flink and Zookeeper

Hi Is there a way to estimate read/write traffic between flink and zk?
I am looking for something like 1000 reads/sec or 1000 writes/sec. And the
size of the message.

Thanks

Re: R/W traffic estimation between Flink and Zookeeper

Posted by Hao Sun <ha...@zendesk.com>.
Great, thanks for the info, Stefan.

On Thu, Nov 16, 2017, 01:59 Stefan Richter <s....@data-artisans.com>
wrote:

> Hi,
>
> I think Zookeeper is only used as a meta data store in HA mode.
> Interactions with ZK are not part of the per-record stream processing code
> paths of Flink. Things that are written to ZK can (also depending on your
> job) include e.g. the job graph, Kafka offsets, or the meta data about
> available checkpoints to recover from. Some of those interactions happen
> only once per job, others happen periodically. In the big picture,
> interactions with ZK happen rather rarely, but of course this also depends
> on configuration parameters like your checkpointing interval. For a typical
> job, I would estimate that ZK interactions occur less than once per second.
> As for typical message sizes, if would estimate something between a few
> bytes or kilobytes for most messages and somewhere in the low two-digit
> megabytes as a typical max size.
>
> Best,
> Stefan
>
> Am 15.11.2017 um 18:41 schrieb Hao Sun <ha...@zendesk.com>:
>
> Thanks Piotr, does Flink read/write to zookeeper every time it process a
> record?
> I thought only JM uses ZK to keep some meta level data, not sure why `it
> depends on many things like state backend used, state size, complexity of
> your application, size of the records, number of machines, their hardware
> and the network.`
>
> On Thu, Oct 12, 2017 at 1:35 AM Piotr Nowojski <pi...@data-artisans.com>
> wrote:
>
>> Hi,
>>
>> Are you asking how to measure records/s or is it possible to achieve it?
>> To measure it you can check numRecordsInPerSecond metric.
>>
>> As far if 1000 records/s is possible, it depends on many things like
>> state backend used, state size, complexity of your application, size of the
>> records, number of machines, their hardware and the network. In the very
>> simplest cases it is possible to achieve millions of records per second per
>> machine. It would be best to try it out in your particular use case on some
>> small scale.
>>
>> Piotrek
>>
>> > On 11 Oct 2017, at 19:58, Hao Sun <ha...@zendesk.com> wrote:
>> >
>> > Hi Is there a way to estimate read/write traffic between flink and zk?
>> > I am looking for something like 1000 reads/sec or 1000 writes/sec. And
>> the size of the message.
>> >
>> > Thanks
>>
>>
>

Re: R/W traffic estimation between Flink and Zookeeper

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

I think Zookeeper is only used as a meta data store in HA mode. Interactions with ZK are not part of the per-record stream processing code paths of Flink. Things that are written to ZK can (also depending on your job) include e.g. the job graph, Kafka offsets, or the meta data about available checkpoints to recover from. Some of those interactions happen only once per job, others happen periodically. In the big picture, interactions with ZK happen rather rarely, but of course this also depends on configuration parameters like your checkpointing interval. For a typical job, I would estimate that ZK interactions occur less than once per second. As for typical message sizes, if would estimate something between a few bytes or kilobytes for most messages and somewhere in the low two-digit megabytes as a typical max size.

Best,
Stefan

> Am 15.11.2017 um 18:41 schrieb Hao Sun <ha...@zendesk.com>:
> 
> Thanks Piotr, does Flink read/write to zookeeper every time it process a record?
> I thought only JM uses ZK to keep some meta level data, not sure why `it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network.`
> 
> On Thu, Oct 12, 2017 at 1:35 AM Piotr Nowojski <piotr@data-artisans.com <ma...@data-artisans.com>> wrote:
> Hi,
> 
> Are you asking how to measure records/s or is it possible to achieve it? To measure it you can check numRecordsInPerSecond metric.
> 
> As far if 1000 records/s is possible, it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network. In the very simplest cases it is possible to achieve millions of records per second per machine. It would be best to try it out in your particular use case on some small scale.
> 
> Piotrek
> 
> > On 11 Oct 2017, at 19:58, Hao Sun <hasun@zendesk.com <ma...@zendesk.com>> wrote:
> >
> > Hi Is there a way to estimate read/write traffic between flink and zk?
> > I am looking for something like 1000 reads/sec or 1000 writes/sec. And the size of the message.
> >
> > Thanks
> 


Re: R/W traffic estimation between Flink and Zookeeper

Posted by Hao Sun <ha...@zendesk.com>.
Thanks Piotr, does Flink read/write to zookeeper every time it process a
record?
I thought only JM uses ZK to keep some meta level data, not sure why `it
depends on many things like state backend used, state size, complexity of
your application, size of the records, number of machines, their hardware
and the network.`

On Thu, Oct 12, 2017 at 1:35 AM Piotr Nowojski <pi...@data-artisans.com>
wrote:

> Hi,
>
> Are you asking how to measure records/s or is it possible to achieve it?
> To measure it you can check numRecordsInPerSecond metric.
>
> As far if 1000 records/s is possible, it depends on many things like state
> backend used, state size, complexity of your application, size of the
> records, number of machines, their hardware and the network. In the very
> simplest cases it is possible to achieve millions of records per second per
> machine. It would be best to try it out in your particular use case on some
> small scale.
>
> Piotrek
>
> > On 11 Oct 2017, at 19:58, Hao Sun <ha...@zendesk.com> wrote:
> >
> > Hi Is there a way to estimate read/write traffic between flink and zk?
> > I am looking for something like 1000 reads/sec or 1000 writes/sec. And
> the size of the message.
> >
> > Thanks
>
>

Re: R/W traffic estimation between Flink and Zookeeper

Posted by Piotr Nowojski <pi...@data-artisans.com>.
Hi,

Are you asking how to measure records/s or is it possible to achieve it? To measure it you can check numRecordsInPerSecond metric.

As far if 1000 records/s is possible, it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network. In the very simplest cases it is possible to achieve millions of records per second per machine. It would be best to try it out in your particular use case on some small scale.

Piotrek

> On 11 Oct 2017, at 19:58, Hao Sun <ha...@zendesk.com> wrote:
> 
> Hi Is there a way to estimate read/write traffic between flink and zk?
> I am looking for something like 1000 reads/sec or 1000 writes/sec. And the size of the message.
> 
> Thanks