You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Kayal P <ka...@gmail.com> on 2022/04/21 04:28:23 UTC

[Code Question] Pcollection to List using Java sdk

Hi All,

I am trying to convert Pcollection<String> to List<String> using Java sdk. Seems there is combiners.ToList transform available in python sdk. Is there any similar option available in Java sdk? If not can someone guide me with right way of doing this? The Pcollection<String> is very small collection less than 10 items. Thanks in advance.


Regards,
Kayal 

Re: [Code Question] Pcollection to List using Java sdk

Posted by Brian Hulette <bh...@google.com>.
An alternative, if you're open to using the Python SDK, would be to use
interactive Beam. With it you can call ib.collect() to materialize a
PCollection in local memory. It also has a %%beam_sql magic to provide
support for SqlTransform. I understand that may not be a feasible change,
but we can provide pointers if you're open to it.

Brian

On Thu, Apr 21, 2022 at 8:06 AM Reuven Lax <re...@google.com> wrote:

> There is a Concatenate combiner in Java, but I think it's a bit overkill
> here.
>
> To elucidate what Alexey said -
>
> PCollection<Iterable<String>> strings = pc.apply(WithKey.of((Void) null)
>     .apply(GroupByKey.create())
>     .apply(Values.create());
>
> This will give you back a single-element PCollection containing an
> Iterable. If you really need it to be a List, you can add the following:
>
>     strings.apply(MapElements
>           .into(TypeDescriptors.lists(TypeDescriptors.strings())
>           .via(iterable - > ImmutableList.copyOf(iterable)));
>
> On Thu, Apr 21, 2022 at 7:47 AM Alexey Romanenko <ar...@gmail.com>
> wrote:
>
>> In this case, if you already know that the size of your result is quite
>> small and fits into memory than you need to have to materialise your
>> results on one worker in the same JVM. You can do that with assigning the
>> same key for all result elements and then apply GroupByKey transform over
>> this PCollection<KV<K,String>>. Alternatively, you can use GroupIntoBatches
>> (see example in Javadoc [1]) transform for better control on this.
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java
>>
>> —
>> Alexey
>>
>> On 21 Apr 2022, at 14:31, Kayal P <ka...@gmail.com> wrote:
>>
>> Hi Alexey,
>>
>> I have a small result of Pcollection<String> from SQLTransform. I have to
>> pass this result to a mailing class that sends mail, with body as values
>> from Pcollection<String>, In a tabular format. The number of elements in
>> Pcollection<String> will be less than 10 always.
>>
>> Regards,
>> Kayal
>>
>>
>> On Apr 21, 2022, at 5:13 AM, Alexey Romanenko <ar...@gmail.com>
>> wrote:
>>
>> Hi Kayal,
>>
>> In general, PCollection is infinite collection of elements. So, there is
>> no only one simple way to do what you are asking and the solution will
>> depend on a case where it’s needed.
>>
>> Could you give an example why and where in your pipeline you do need
>> this?
>>
>> —
>> Alexey
>>
>> On 21 Apr 2022, at 06:28, Kayal P <ka...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I am trying to convert Pcollection<String> to List<String> using Java
>> sdk. Seems there is combiners.ToList transform available in python sdk. Is
>> there any similar option available in Java sdk? If not can someone guide me
>> with right way of doing this? The Pcollection<String> is very small
>> collection less than 10 items. Thanks in advance.
>>
>>
>> Regards,
>> Kayal
>>
>>
>>
>>

Re: [Code Question] Pcollection to List using Java sdk

Posted by Reuven Lax <re...@google.com>.
There is a Concatenate combiner in Java, but I think it's a bit overkill
here.

To elucidate what Alexey said -

PCollection<Iterable<String>> strings = pc.apply(WithKey.of((Void) null)
    .apply(GroupByKey.create())
    .apply(Values.create());

This will give you back a single-element PCollection containing an
Iterable. If you really need it to be a List, you can add the following:

    strings.apply(MapElements
          .into(TypeDescriptors.lists(TypeDescriptors.strings())
          .via(iterable - > ImmutableList.copyOf(iterable)));

On Thu, Apr 21, 2022 at 7:47 AM Alexey Romanenko <ar...@gmail.com>
wrote:

> In this case, if you already know that the size of your result is quite
> small and fits into memory than you need to have to materialise your
> results on one worker in the same JVM. You can do that with assigning the
> same key for all result elements and then apply GroupByKey transform over
> this PCollection<KV<K,String>>. Alternatively, you can use GroupIntoBatches
> (see example in Javadoc [1]) transform for better control on this.
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java
>
> —
> Alexey
>
> On 21 Apr 2022, at 14:31, Kayal P <ka...@gmail.com> wrote:
>
> Hi Alexey,
>
> I have a small result of Pcollection<String> from SQLTransform. I have to
> pass this result to a mailing class that sends mail, with body as values
> from Pcollection<String>, In a tabular format. The number of elements in
> Pcollection<String> will be less than 10 always.
>
> Regards,
> Kayal
>
>
> On Apr 21, 2022, at 5:13 AM, Alexey Romanenko <ar...@gmail.com>
> wrote:
>
> Hi Kayal,
>
> In general, PCollection is infinite collection of elements. So, there is
> no only one simple way to do what you are asking and the solution will
> depend on a case where it’s needed.
>
> Could you give an example why and where in your pipeline you do need this?
>
> —
> Alexey
>
> On 21 Apr 2022, at 06:28, Kayal P <ka...@gmail.com> wrote:
>
> Hi All,
>
> I am trying to convert Pcollection<String> to List<String> using Java sdk.
> Seems there is combiners.ToList transform available in python sdk. Is there
> any similar option available in Java sdk? If not can someone guide me with
> right way of doing this? The Pcollection<String> is very small collection
> less than 10 items. Thanks in advance.
>
>
> Regards,
> Kayal
>
>
>
>

Re: [Code Question] Pcollection to List using Java sdk

Posted by Alexey Romanenko <ar...@gmail.com>.
In this case, if you already know that the size of your result is quite small and fits into memory than you need to have to materialise your results on one worker in the same JVM. You can do that with assigning the same key for all result elements and then apply GroupByKey transform over this PCollection<KV<K,String>>. Alternatively, you can use GroupIntoBatches (see example in Javadoc [1]) transform for better control on this.

[1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java>

—
Alexey

> On 21 Apr 2022, at 14:31, Kayal P <ka...@gmail.com> wrote:
> 
> Hi Alexey,
> 
> I have a small result of Pcollection<String> from SQLTransform. I have to pass this result to a mailing class that sends mail, with body as values from Pcollection<String>, In a tabular format. The number of elements in Pcollection<String> will be less than 10 always.
> 
> Regards,
> Kayal
> 
>> 
>> On Apr 21, 2022, at 5:13 AM, Alexey Romanenko <ar...@gmail.com> wrote:
>> 
>> Hi Kayal,
>> 
>> In general, PCollection is infinite collection of elements. So, there is no only one simple way to do what you are asking and the solution will depend on a case where it’s needed.
>> 
>> Could you give an example why and where in your pipeline you do need this? 
>> 
>> —
>> Alexey
>> 
>>> On 21 Apr 2022, at 06:28, Kayal P <kayalpaarthipan@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hi All,
>>> 
>>> I am trying to convert Pcollection<String> to List<String> using Java sdk. Seems there is combiners.ToList transform available in python sdk. Is there any similar option available in Java sdk? If not can someone guide me with right way of doing this? The Pcollection<String> is very small collection less than 10 items. Thanks in advance.
>>> 
>>> Regards,
>>> Kayal 
>> 


Re: [Code Question] Pcollection to List using Java sdk

Posted by Kayal P <ka...@gmail.com>.
Hi Alexey,

I have a small result of Pcollection<String> from SQLTransform. I have to pass this result to a mailing class that sends mail, with body as values from Pcollection<String>, In a tabular format. The number of elements in Pcollection<String> will be less than 10 always.

Regards,
Kayal

> 
> On Apr 21, 2022, at 5:13 AM, Alexey Romanenko <ar...@gmail.com> wrote:
> 
> Hi Kayal,
> 
> In general, PCollection is infinite collection of elements. So, there is no only one simple way to do what you are asking and the solution will depend on a case where it’s needed.
> 
> Could you give an example why and where in your pipeline you do need this? 
> 
> —
> Alexey
> 
>> On 21 Apr 2022, at 06:28, Kayal P <ka...@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> I am trying to convert Pcollection<String> to List<String> using Java sdk. Seems there is combiners.ToList transform available in python sdk. Is there any similar option available in Java sdk? If not can someone guide me with right way of doing this? The Pcollection<String> is very small collection less than 10 items. Thanks in advance.
>> 
>> Regards,
>> Kayal 
> 

Re: [Code Question] Pcollection to List using Java sdk

Posted by Alexey Romanenko <ar...@gmail.com>.
Hi Kayal,

In general, PCollection is infinite collection of elements. So, there is no only one simple way to do what you are asking and the solution will depend on a case where it’s needed.

Could you give an example why and where in your pipeline you do need this? 

—
Alexey

> On 21 Apr 2022, at 06:28, Kayal P <ka...@gmail.com> wrote:
> 
> Hi All,
> 
> I am trying to convert Pcollection<String> to List<String> using Java sdk. Seems there is combiners.ToList transform available in python sdk. Is there any similar option available in Java sdk? If not can someone guide me with right way of doing this? The Pcollection<String> is very small collection less than 10 items. Thanks in advance.
> 
> Regards,
> Kayal