You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by André Rocha Silva <a....@portaltelemedicina.com.br> on 2020/11/04 00:57:05 UTC

Limiting elements on streaming pipelines

Fellow users

I am not very used to making streaming pipelines, but I have a batch to
write to pub/sub.

My pipeline starts with a 'fake' element only to trigger the next step.
Then in a FlatMap I use a For that yields many elements inside a for. But
in the last step I've got only 100 elements coming in.
Should I work with windowing or something like that?
my_pipeline = (
p
| 'Creating pipeline' >> beam.Create(['1'])
| 'Get things' >> beam.FlatMap(GetThings)
| 'Post on Pub/Sub' >> beam.io.WriteToPubSub(topic=user_options.topic.get())
)

I am working on python. Apache beam 2.17, Python 3.7

Thank you for helping me!

-- 

   *ANDRÉ ROCHA SILVA*
  * DATA ENGINEER*
  (48) 3181-0611

  <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
<http://portaltelemedicina.com.br/>
<https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
<https://pt-br.facebook.com/PortalTelemedicina/>
<https://www.linkedin.com/company/9426084/>

Re: Limiting elements on streaming pipelines

Posted by Reza Ardeshir Rokni <ra...@gmail.com>.
Hi,

You may want to use more than one element in your Create to start the
FlatMap process as with a runner that does Fusion
<https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#fusion-optimization>,
the code will end up only being able to parallelize to 1. So make use of a
Create with say O(10's) elements and have each one of those then do a
partition of the for loop work.

Cheers
Reza

On Wed, 4 Nov 2020 at 08:57, André Rocha Silva <
a.silva@portaltelemedicina.com.br> wrote:

> Fellow users
>
> I am not very used to making streaming pipelines, but I have a batch to
> write to pub/sub.
>
> My pipeline starts with a 'fake' element only to trigger the next step.
> Then in a FlatMap I use a For that yields many elements inside a for. But
> in the last step I've got only 100 elements coming in.
> Should I work with windowing or something like that?
> my_pipeline = (
> p
> | 'Creating pipeline' >> beam.Create(['1'])
> | 'Get things' >> beam.FlatMap(GetThings)
> | 'Post on Pub/Sub' >> beam.io.WriteToPubSub(topic
> =user_options.topic.get())
> )
>
> I am working on python. Apache beam 2.17, Python 3.7
>
> Thank you for helping me!
>
> --
>
>    *ANDRÉ ROCHA SILVA*
>   * DATA ENGINEER*
>   (48) 3181-0611
>
>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
> <http://portaltelemedicina.com.br/>
> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
> <https://pt-br.facebook.com/PortalTelemedicina/>
> <https://www.linkedin.com/company/9426084/>
>
>