You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Andrew Jones <an...@andrew-jones.com> on 2017/11/30 17:35:31 UTC

Pubsub -> Bream -> many files

Hi,

I'm new to Beam. I have a use case where I want to read from a Pubsub
stream, transform the data in Beam, and write to many outputs.

As a simple example, say I'm reading words from Pubsub, I get the first
letter of the word, and then I write to a file for that letter.

I want to do this programmatically, so I don't want to have to know all
the outputs beforehand, but they can be created as we need them, based
on the data that comes in.

Has anyone done something similar with Beam, or have any examples?

At the moment I'm looking at tagged outputs, but the documentation
suggests that I need to know the outputs beforehand and create
TupleTag's for each.

Another option might simply be to use GroupByKey, but then I'm not sure
if I can pass the result to TextIO.

Thanks,
Andrew

Re: Pubsub -> Bream -> many files

Posted by Eugene Kirpichov <ki...@google.com>.
TextIO.write().to(DynamicDestinations), available in Beam 2.2, does exactly
this.

On Thu, Nov 30, 2017, 9:35 AM Andrew Jones <an...@andrew-jones.com>
wrote:

> Hi,
>
> I'm new to Beam. I have a use case where I want to read from a Pubsub
> stream, transform the data in Beam, and write to many outputs.
>
> As a simple example, say I'm reading words from Pubsub, I get the first
> letter of the word, and then I write to a file for that letter.
>
> I want to do this programmatically, so I don't want to have to know all
> the outputs beforehand, but they can be created as we need them, based
> on the data that comes in.
>
> Has anyone done something similar with Beam, or have any examples?
>
> At the moment I'm looking at tagged outputs, but the documentation
> suggests that I need to know the outputs beforehand and create
> TupleTag's for each.
>
> Another option might simply be to use GroupByKey, but then I'm not sure
> if I can pass the result to TextIO.
>
> Thanks,
> Andrew
>