You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Qing Lim <q....@mwam.com> on 2022/10/21 16:19:39 UTC

Difference between DataStream.broadcast() vs DataStream.broadcast(MapStateDescriptor)

Hi all, I am trying to figure out how Datastream.broadcast() and DataStream.broadcast(MapStateDescriptor) differ.

My use case:
I have 2 streams:
Stream 1 contains updates, which collectively build up a state
Stream 2 is keyed and every parallel instance need to connect with EVERY update from Stream 1.

I am thinking I can probably achieve this by doing

Stream1.broadcast().connect(stream2).process(myFun)

I am failing to understand when would I need to use Broadcast State pattern, is it a convience method built on top of broadcast() or is it something very different?

The best info I've found is from this SO: https://stackoverflow.com/questions/50570605/why-broadcast-state-can-store-the-dynamic-rules-however-broadcast-operator-c
Which seems to suggest Broadcast State broadcast() then maintain state in each parallel operator under the hood?

Kind regards.

Qing Lim | Marshall Wace LLP, George House, 131 Sloane Street, London | E-mail: q.lim@mwam.com<ma...@mwam.com> | Tel: +44 207 925 4865


This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. If you are not the intended recipient of this e-mail you are hereby notified that any dissemination, distribution, or copying of its content is strictly prohibited. If you have received this message in error, please notify the sender by return e-mail and destroy the message and all copies in your possession.

To find out more details about how we may collect, use and share your personal information, please see https://www.mwam.com/privacy-policy. This includes details of how calls you make to us may be recorded in order for us to comply with our legal and regulatory obligations.

To the extent that the contents of this email constitutes a financial promotion, please note that it is issued only to and/or directed only at persons who are professional clients or eligible counterparties as defined in the FCA Rules. Any investment products or services described in this email are available only to professional clients and eligible counterparties. Persons who are not professional clients or eligible counterparties should not rely or act on the contents of this email.

Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P. (“MWNA”), which is registered with the US Securities and Exchange Commission (“SEC”) as an investment adviser.  Registration with the SEC does not imply that MWNA or its employees possess a certain level of skill or training.

Re: Difference between DataStream.broadcast() vs DataStream.broadcast(MapStateDescriptor)

Posted by Gen Luo <lu...@gmail.com>.
Datastream.broadcast only determines the distribution behavior. All
elements from the stream will broadcast to all the downstream tasks. Its
downstream can be a single input processing operator, or a co-processing
operator if it's connected to another stream.

DataStream.broadcast(MapStateDescriptor) can only be used to connect to
another stream, and it declares one or more StateDescriptors, which allow
the BroadcastProcessFunction following able to keep some states. That is
the BroadcastState mentioned in the SO answer. The BroadcastState is kind
of an operator state, while it assumes that the state of all instances are
exactly the same, so it can be duplicated to the new instances when the job
is restarted with the processor scaled up. The behavior differs from the
normal operator state.

In one word, if you need to use the BroadcastState, use
DataStream.broadcast(MapStateDescriptor); if you only want to broadcast the
elements, DataStream.broadcast is enough.

Qing Lim <q....@mwam.com> 于 2022年10月22日周六 00:21写道:

> Hi all, I am trying to figure out how Datastream.broadcast() and
> DataStream.broadcast(MapStateDescriptor) differ.
>
>
>
> My use case:
>
> I have 2 streams:
>
> Stream 1 contains updates, which collectively build up a state
>
> Stream 2 is keyed and every parallel instance need to connect with EVERY
> update from Stream 1.
>
>
>
> I am thinking I can probably achieve this by doing
>
>
>
> Stream1.broadcast().connect(stream2).process(myFun)
>
>
>
> I am failing to understand when would I need to use Broadcast State
> pattern, is it a convience method built on top of broadcast() or is it
> something very different?
>
>
>
> The best info I’ve found is from this SO:
> https://stackoverflow.com/questions/50570605/why-broadcast-state-can-store-the-dynamic-rules-however-broadcast-operator-c
>
> Which seems to suggest Broadcast State broadcast() then maintain state in
> each parallel operator under the hood?
>
>
>
> Kind regards.
>
>
>
> *Qing Lim *| Marshall Wace LLP, George House, 131 Sloane Street, London | E-mail:
> q.lim@mwam.com | Tel: +44 207 925 4865
>
>
>
>
>
> This e-mail and any attachments are confidential to the addressee(s) and
> may contain information that is legally privileged and/or confidential. If
> you are not the intended recipient of this e-mail you are hereby notified
> that any dissemination, distribution, or copying of its content is strictly
> prohibited. If you have received this message in error, please notify the
> sender by return e-mail and destroy the message and all copies in your
> possession.
>
>
> To find out more details about how we may collect, use and share your
> personal information, please see https://www.mwam.com/privacy-policy.
> This includes details of how calls you make to us may be recorded in order
> for us to comply with our legal and regulatory obligations.
>
>
> To the extent that the contents of this email constitutes a financial
> promotion, please note that it is issued only to and/or directed only at
> persons who are professional clients or eligible counterparties as defined
> in the FCA Rules. Any investment products or services described in this
> email are available only to professional clients and eligible
> counterparties. Persons who are not professional clients or eligible
> counterparties should not rely or act on the contents of this email.
>
>
> Marshall Wace LLP is authorised and regulated by the Financial Conduct
> Authority. Marshall Wace LLP is a limited liability partnership registered
> in England and Wales with registered number OC302228 and registered office
> at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving
> this e-mail as a client, or an investor in an investment vehicle, managed
> or advised by Marshall Wace North America L.P., the sender of this e-mail
> is communicating with you in the sender's capacity as an associated or
> related person of Marshall Wace North America L.P. ("MWNA"), which is
> registered with the US Securities and Exchange Commission ("SEC") as an
> investment adviser.  Registration with the SEC does not imply that MWNA or
> its employees possess a certain level of skill or training.
>