You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Hubert Chen <hu...@gmail.com> on 2020/04/14 19:21:58 UTC

Joining clicks and views example

Hello,

I'm a little confused on the Joining clicks and views
<https://beam.apache.org/documentation/programming-guide/#state-timers-examples>
example in
the documentation. I was wondering if I could get some help.

The example starts with the following snippet of code:

PCollection<KV<String, Event>> eventsPerLinkId =
    readEvents()
    .apply(WithKeys.of(Event::getLinkId).withKeyType(TypeDescriptors.strings()));
perUser.apply(ParDo.of(new DoFn<KV<String, Event>, JoinedEvent>() {
    ...
}));


The example does not show how the `perUser` PCollection is generated or
where the `eventsPerLinkId` PCollection is used.

How are the two streams, views and clicks, being joined into a single
PCollection? Would it be optimal to use the `Flatten` transformation? I'm
confused on the steps necessary between reading from two streams to
processing on a single stream.

Best,
Hubert

Re: Joining clicks and views example

Posted by Luke Cwik <lc...@google.com>.
They can either come from a single event stream such as Pubsub or Kafka or
from multiple event streams which are flattened together.

On Tue, Apr 14, 2020 at 12:22 PM Hubert Chen <hu...@gmail.com> wrote:

> Hello,
>
> I'm a little confused on the Joining clicks and views
> <https://beam.apache.org/documentation/programming-guide/#state-timers-examples> example in
> the documentation. I was wondering if I could get some help.
>
> The example starts with the following snippet of code:
>
> PCollection<KV<String, Event>> eventsPerLinkId =
>     readEvents()
>     .apply(WithKeys.of(Event::getLinkId).withKeyType(TypeDescriptors.strings()));
> perUser.apply(ParDo.of(new DoFn<KV<String, Event>, JoinedEvent>() {
>     ...
> }));
>
>
> The example does not show how the `perUser` PCollection is generated or
> where the `eventsPerLinkId` PCollection is used.
>
> How are the two streams, views and clicks, being joined into a single
> PCollection? Would it be optimal to use the `Flatten` transformation? I'm
> confused on the steps necessary between reading from two streams to
> processing on a single stream.
>
> Best,
> Hubert
>