You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Chinni, Madhavi via user" <us...@beam.apache.org> on 2022/09/29 13:10:41 UTC

Request to suggest alternative approaches for side input use cases in apache beam

Hi,

We have a stream processing pipeline which process the customer UI interactions data .
As part of the pipeline we read the information from AWS redis cache and store it in a PCollectionView. The PCollectionView is accessed as side input in the next CombineFnWithContext accumulators and transform functions in the pipeline.
Could you please suggest an alternative approach where we can avoid using side input for accessing redis cache information in next functions in the pipeline.

Thanks,
Madhavi

Re: Request to suggest alternative approaches for side input use cases in apache beam

Posted by Alexey Romanenko <ar...@gmail.com>.
Well, it depends on how you do use a Redis cache and how often it’s changing. 

For example, if you need to request a cache for a group of input records then you can group them into batches and do only one remote call to cache before processing this batch, like explained here [1]

In any case, the more details about your use-case and why side inout approach doesn’t work well for you would be helpful.

[1] https://beam.apache.org/documentation/patterns/grouping-elements-for-efficient-external-service-calls/ <https://beam.apache.org/documentation/patterns/grouping-elements-for-efficient-external-service-calls/>

—
Alexey

> On 29 Sep 2022, at 15:10, Chinni, Madhavi via user <us...@beam.apache.org> wrote:
> 
> Hi,
>  
> We have a stream processing pipeline which process the customer UI interactions data .
> As part of the pipeline we read the information from AWS redis cache and store it in a PCollectionView. The PCollectionView is accessed as side input in the next CombineFnWithContext accumulators and transform functions in the pipeline.
> Could you please suggest an alternative approach where we can avoid using side input for accessing redis cache information in next functions in the pipeline.
>  
> Thanks,
> Madhavi