You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Chamikara Jayalath via user <us...@beam.apache.org> on 2023/11/01 18:05:41 UTC

Re: [Question] - How can i listen to multiple Pub/Sub input topics using SideInput?

Currently only some Beam sources are able to consume a configuration (set
of topics here) that is dynamically generated and I don't think PubSubIO is
one of them. So probably you'll have to implement a custom DoFn that reads
from Cloud Pub/Sub to support this. Also, probably you'll have to constrain
reading with size/time since Cloud Pub/Sub is an unbounded input source.

Thanks,
Cham



On Tue, Oct 31, 2023 at 2:22 PM Pravin DSouza <pr...@gmail.com> wrote:

> Hi,
>
> I have a use case where I want to listen to multiple Pub/Sub input topics
> and route messages from all multiple input topics to a single destination
> output topic.
> The number of input topics can change any time and are stored in a table.
> For example when I deploy for the first time, I might be reading from 3
> topics, but after 2 days the table is updated and now I want to read from 4
> topics.I don't want to redeploy the job because of a change in input topics
> list.
> I know how to use SideInputs for refresh, but I am unable to use SideInputs
> to read the input topic details and pass to PubSubIO.
> Can you please suggest which of the following ways this can be achieved if
> new input topics are added:
> 1. Using SideInput
> 2. Any other approach
> 3. Redeploy the job (this is the last course I want to rely on)
>
> Please suggest.
>
> Thank you,
> Pravin
>

Re: [Question] - How can i listen to multiple Pub/Sub input topics using SideInput?

Posted by Pravin DSouza <pr...@gmail.com>.
Thanks Cham for this.
I actually came across this yesterday and am now looking into it. Not sure
if it will work but trying it out.

Regards,
Pravin

On Tue, Nov 7, 2023 at 12:36 PM Chamikara Jayalath via user <
user@beam.apache.org> wrote:

> Not sure what runner you are using but some runners support updating or
> redeploying your pipeline with the changes which might be easier than doing
> it manually using your workflow. For example, Dataflow streaming can deploy
> a replacement job for you with the updated config for PubSub:
> https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline#Launching
>
>
> Thanks,
> Cham
>
> On Tue, Nov 7, 2023 at 9:25 AM Pravin DSouza <pr...@gmail.com> wrote:
>
>> Hi Wisniowski,
>>
>> Thank you for the inputs.
>> It is not possible for us to identify all the topics in advance as new
>> requirements in future might warrant new topics or may be able to use one
>> of existing topics.
>> As such, we have decided to proceed with the redeployment option for now
>> if the input topic list changes.
>>
>> Regards,
>> Pravin
>>
>> On Sun, Nov 5, 2023 at 3:10 AM Piotr Wiśniowski <
>> contact.wisniowskipiotr@gmail.com> wrote:
>>
>>> Hi,
>>> Another idea. Less cost effective, but easier to implement and maintain:
>>> 1. Read from all possible topics.
>>> 2. 'Map' to add constant topic name to every topic. (Or some other
>>> variations of this step)
>>> 3. 'Filter' with side input to just filter out all topics that should be
>>> silenced.
>>> 4. 'Flatten'
>>> But this requires that new topics are not created during run of the
>>> pipeline and You know them. If not then suggestion with custom 'ParDo' is
>>> only option, but I would also suggest rethinking Your infrastructure setup.
>>>
>>> Best
>>> Wiśniowski Piotr
>>>
>>> śr., 1 lis 2023, 19:06 użytkownik Chamikara Jayalath via user <
>>> user@beam.apache.org> napisał:
>>>
>>>> Currently only some Beam sources are able to consume a configuration
>>>> (set of topics here) that is dynamically generated and I don't think
>>>> PubSubIO is one of them. So probably you'll have to implement a custom DoFn
>>>> that reads from Cloud Pub/Sub to support this. Also, probably you'll have
>>>> to constrain reading with size/time since Cloud Pub/Sub is an unbounded
>>>> input source.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>>
>>>>
>>>> On Tue, Oct 31, 2023 at 2:22 PM Pravin DSouza <pr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a use case where I want to listen to multiple Pub/Sub input
>>>>> topics
>>>>> and route messages from all multiple input topics to a single
>>>>> destination
>>>>> output topic.
>>>>> The number of input topics can change any time and are stored in a
>>>>> table.
>>>>> For example when I deploy for the first time, I might be reading from 3
>>>>> topics, but after 2 days the table is updated and now I want to read
>>>>> from 4
>>>>> topics.I don't want to redeploy the job because of a change in input
>>>>> topics
>>>>> list.
>>>>> I know how to use SideInputs for refresh, but I am unable to use
>>>>> SideInputs
>>>>> to read the input topic details and pass to PubSubIO.
>>>>> Can you please suggest which of the following ways this can be
>>>>> achieved if
>>>>> new input topics are added:
>>>>> 1. Using SideInput
>>>>> 2. Any other approach
>>>>> 3. Redeploy the job (this is the last course I want to rely on)
>>>>>
>>>>> Please suggest.
>>>>>
>>>>> Thank you,
>>>>> Pravin
>>>>>
>>>>

Re: [Question] - How can i listen to multiple Pub/Sub input topics using SideInput?

Posted by Chamikara Jayalath via user <us...@beam.apache.org>.
Not sure what runner you are using but some runners support updating or
redeploying your pipeline with the changes which might be easier than doing
it manually using your workflow. For example, Dataflow streaming can deploy
a replacement job for you with the updated config for PubSub:
https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline#Launching

Thanks,
Cham

On Tue, Nov 7, 2023 at 9:25 AM Pravin DSouza <pr...@gmail.com> wrote:

> Hi Wisniowski,
>
> Thank you for the inputs.
> It is not possible for us to identify all the topics in advance as new
> requirements in future might warrant new topics or may be able to use one
> of existing topics.
> As such, we have decided to proceed with the redeployment option for now
> if the input topic list changes.
>
> Regards,
> Pravin
>
> On Sun, Nov 5, 2023 at 3:10 AM Piotr Wiśniowski <
> contact.wisniowskipiotr@gmail.com> wrote:
>
>> Hi,
>> Another idea. Less cost effective, but easier to implement and maintain:
>> 1. Read from all possible topics.
>> 2. 'Map' to add constant topic name to every topic. (Or some other
>> variations of this step)
>> 3. 'Filter' with side input to just filter out all topics that should be
>> silenced.
>> 4. 'Flatten'
>> But this requires that new topics are not created during run of the
>> pipeline and You know them. If not then suggestion with custom 'ParDo' is
>> only option, but I would also suggest rethinking Your infrastructure setup.
>>
>> Best
>> Wiśniowski Piotr
>>
>> śr., 1 lis 2023, 19:06 użytkownik Chamikara Jayalath via user <
>> user@beam.apache.org> napisał:
>>
>>> Currently only some Beam sources are able to consume a configuration
>>> (set of topics here) that is dynamically generated and I don't think
>>> PubSubIO is one of them. So probably you'll have to implement a custom DoFn
>>> that reads from Cloud Pub/Sub to support this. Also, probably you'll have
>>> to constrain reading with size/time since Cloud Pub/Sub is an unbounded
>>> input source.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>
>>> On Tue, Oct 31, 2023 at 2:22 PM Pravin DSouza <pr...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a use case where I want to listen to multiple Pub/Sub input
>>>> topics
>>>> and route messages from all multiple input topics to a single
>>>> destination
>>>> output topic.
>>>> The number of input topics can change any time and are stored in a
>>>> table.
>>>> For example when I deploy for the first time, I might be reading from 3
>>>> topics, but after 2 days the table is updated and now I want to read
>>>> from 4
>>>> topics.I don't want to redeploy the job because of a change in input
>>>> topics
>>>> list.
>>>> I know how to use SideInputs for refresh, but I am unable to use
>>>> SideInputs
>>>> to read the input topic details and pass to PubSubIO.
>>>> Can you please suggest which of the following ways this can be achieved
>>>> if
>>>> new input topics are added:
>>>> 1. Using SideInput
>>>> 2. Any other approach
>>>> 3. Redeploy the job (this is the last course I want to rely on)
>>>>
>>>> Please suggest.
>>>>
>>>> Thank you,
>>>> Pravin
>>>>
>>>

Re: [Question] - How can i listen to multiple Pub/Sub input topics using SideInput?

Posted by Pravin DSouza <pr...@gmail.com>.
Hi Wisniowski,

Thank you for the inputs.
It is not possible for us to identify all the topics in advance as new
requirements in future might warrant new topics or may be able to use one
of existing topics.
As such, we have decided to proceed with the redeployment option for now if
the input topic list changes.

Regards,
Pravin

On Sun, Nov 5, 2023 at 3:10 AM Piotr Wiśniowski <
contact.wisniowskipiotr@gmail.com> wrote:

> Hi,
> Another idea. Less cost effective, but easier to implement and maintain:
> 1. Read from all possible topics.
> 2. 'Map' to add constant topic name to every topic. (Or some other
> variations of this step)
> 3. 'Filter' with side input to just filter out all topics that should be
> silenced.
> 4. 'Flatten'
> But this requires that new topics are not created during run of the
> pipeline and You know them. If not then suggestion with custom 'ParDo' is
> only option, but I would also suggest rethinking Your infrastructure setup.
>
> Best
> Wiśniowski Piotr
>
> śr., 1 lis 2023, 19:06 użytkownik Chamikara Jayalath via user <
> user@beam.apache.org> napisał:
>
>> Currently only some Beam sources are able to consume a configuration (set
>> of topics here) that is dynamically generated and I don't think PubSubIO is
>> one of them. So probably you'll have to implement a custom DoFn that reads
>> from Cloud Pub/Sub to support this. Also, probably you'll have to constrain
>> reading with size/time since Cloud Pub/Sub is an unbounded input source.
>>
>> Thanks,
>> Cham
>>
>>
>>
>> On Tue, Oct 31, 2023 at 2:22 PM Pravin DSouza <pr...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a use case where I want to listen to multiple Pub/Sub input topics
>>> and route messages from all multiple input topics to a single destination
>>> output topic.
>>> The number of input topics can change any time and are stored in a table.
>>> For example when I deploy for the first time, I might be reading from 3
>>> topics, but after 2 days the table is updated and now I want to read
>>> from 4
>>> topics.I don't want to redeploy the job because of a change in input
>>> topics
>>> list.
>>> I know how to use SideInputs for refresh, but I am unable to use
>>> SideInputs
>>> to read the input topic details and pass to PubSubIO.
>>> Can you please suggest which of the following ways this can be achieved
>>> if
>>> new input topics are added:
>>> 1. Using SideInput
>>> 2. Any other approach
>>> 3. Redeploy the job (this is the last course I want to rely on)
>>>
>>> Please suggest.
>>>
>>> Thank you,
>>> Pravin
>>>
>>

Re: [Question] - How can i listen to multiple Pub/Sub input topics using SideInput?

Posted by Piotr Wiśniowski <co...@gmail.com>.
Hi,
Another idea. Less cost effective, but easier to implement and maintain:
1. Read from all possible topics.
2. 'Map' to add constant topic name to every topic. (Or some other
variations of this step)
3. 'Filter' with side input to just filter out all topics that should be
silenced.
4. 'Flatten'
But this requires that new topics are not created during run of the
pipeline and You know them. If not then suggestion with custom 'ParDo' is
only option, but I would also suggest rethinking Your infrastructure setup.

Best
Wiśniowski Piotr

śr., 1 lis 2023, 19:06 użytkownik Chamikara Jayalath via user <
user@beam.apache.org> napisał:

> Currently only some Beam sources are able to consume a configuration (set
> of topics here) that is dynamically generated and I don't think PubSubIO is
> one of them. So probably you'll have to implement a custom DoFn that reads
> from Cloud Pub/Sub to support this. Also, probably you'll have to constrain
> reading with size/time since Cloud Pub/Sub is an unbounded input source.
>
> Thanks,
> Cham
>
>
>
> On Tue, Oct 31, 2023 at 2:22 PM Pravin DSouza <pr...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a use case where I want to listen to multiple Pub/Sub input topics
>> and route messages from all multiple input topics to a single destination
>> output topic.
>> The number of input topics can change any time and are stored in a table.
>> For example when I deploy for the first time, I might be reading from 3
>> topics, but after 2 days the table is updated and now I want to read from
>> 4
>> topics.I don't want to redeploy the job because of a change in input
>> topics
>> list.
>> I know how to use SideInputs for refresh, but I am unable to use
>> SideInputs
>> to read the input topic details and pass to PubSubIO.
>> Can you please suggest which of the following ways this can be achieved if
>> new input topics are added:
>> 1. Using SideInput
>> 2. Any other approach
>> 3. Redeploy the job (this is the last course I want to rely on)
>>
>> Please suggest.
>>
>> Thank you,
>> Pravin
>>
>

Re: [Question] - How can i listen to multiple Pub/Sub input topics using SideInput?

Posted by Pravin DSouza <pr...@gmail.com>.
Thanks Cham.
We analyzed and decided our requirement is not worth the effort needed to
handle the unbounded Cloud PubSub input source.
Since our input list will not change frequently, we are considering
redeployment as better option for now.

Regards,
Pravin

On Wed, Nov 1, 2023 at 2:06 PM Chamikara Jayalath via user <
user@beam.apache.org> wrote:

> Currently only some Beam sources are able to consume a configuration (set
> of topics here) that is dynamically generated and I don't think PubSubIO is
> one of them. So probably you'll have to implement a custom DoFn that reads
> from Cloud Pub/Sub to support this. Also, probably you'll have to constrain
> reading with size/time since Cloud Pub/Sub is an unbounded input source.
>
> Thanks,
> Cham
>
>
>
> On Tue, Oct 31, 2023 at 2:22 PM Pravin DSouza <pr...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a use case where I want to listen to multiple Pub/Sub input topics
>> and route messages from all multiple input topics to a single destination
>> output topic.
>> The number of input topics can change any time and are stored in a table.
>> For example when I deploy for the first time, I might be reading from 3
>> topics, but after 2 days the table is updated and now I want to read from
>> 4
>> topics.I don't want to redeploy the job because of a change in input
>> topics
>> list.
>> I know how to use SideInputs for refresh, but I am unable to use
>> SideInputs
>> to read the input topic details and pass to PubSubIO.
>> Can you please suggest which of the following ways this can be achieved if
>> new input topics are added:
>> 1. Using SideInput
>> 2. Any other approach
>> 3. Redeploy the job (this is the last course I want to rely on)
>>
>> Please suggest.
>>
>> Thank you,
>> Pravin
>>
>