You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Daria Malkova <da...@akvelon.com> on 2021/06/08 07:28:37 UTC

beam new feature

Hi community!

I've noticed that there is no possibility in Beam JDBC to use partitioning for reading a very large table with millions of rows in parallel (for example when migrating legacy database data to BigQuery).
I have some ideas which are decribed here in more detailes:
https://docs.google.com/document/d/1wBzVhQEhTK23ALzTSZ_CVouEOXTm3w2-LjmO3ieUvFc/edit?usp=sharing
I would like to start working on the related task I've created https://issues.apache.org/jira/browse/BEAM-12456
If anybody have any concerns or proposals please feel free to leave comments at google doc.

Thank you,
Daria

Re: beam new feature

Posted by Alexey Romanenko <ar...@gmail.com>.

Thanks, I left my comments as well.

—
Alexey

> On 8 Jun 2021, at 22:21, Luke Cwik <lc...@google.com> wrote:
> 
> Thanks, I left a few comments in the doc.
> 
> On Tue, Jun 8, 2021 at 12:26 PM Daria Malkova <daria.malkova@akvelon.com <ma...@akvelon.com>> wrote:
> 
> Hi community!
> 
> I've noticed that there is no possibility in Beam JDBC to use partitioning for reading a very large table with millions of rows in parallel (for example when migrating legacy database data to BigQuery).
> I have some ideas which are decribed here in more detailes: 
> https://docs.google.com/document/d/1wBzVhQEhTK23ALzTSZ_CVouEOXTm3w2-LjmO3ieUvFc/edit?usp=sharing <https://docs.google.com/document/d/1wBzVhQEhTK23ALzTSZ_CVouEOXTm3w2-LjmO3ieUvFc/edit?usp=sharing>
> I would like to start working on the related task I've created https://issues.apache.org/jira/browse/BEAM-12456 <https://issues.apache.org/jira/browse/BEAM-12456>
> If anybody have any concerns or proposals please feel free to leave comments at google doc.
> 
> Thank you,
> Daria
> 
>

Re: beam new feature

Posted by Luke Cwik <lc...@google.com>.

Thanks, I left a few comments in the doc.

On Tue, Jun 8, 2021 at 12:26 PM Daria Malkova <da...@akvelon.com>
wrote:

> Hi community!
>
> I've noticed that there is no possibility in Beam JDBC to use partitioning
> for reading a very large table with millions of rows in parallel (for
> example when migrating legacy database data to BigQuery).
> I have some ideas which are decribed here in more detailes:
>
> https://docs.google.com/document/d/1wBzVhQEhTK23ALzTSZ_CVouEOXTm3w2-LjmO3ieUvFc/edit?usp=sharing
> I would like to start working on the related task I've created
> https://issues.apache.org/jira/browse/BEAM-12456
> If anybody have any concerns or proposals please feel free to leave
> comments at google doc.
>
> Thank you,
> Daria
>
>

Re: beam new feature

Posted by Daria Malkova <da...@akvelon.com>.

Hi community!


Based on this design doc (https://docs.google.com/document/d/1wBzVhQEhTK23ALzTSZ_CVouEOXTm3w2-LjmO3ieUvFc/edit?usp=sharing) I’ve created a PR https://github.com/apache/beam/pull/15049 . Looking forward to your reviews and please feel free to ask any questions.


Best,

Daria Malkova

8 июня 2021 г., в 10:28, Daria Malkova <da...@akvelon.com> написал(а):



Hi community!

I've noticed that there is no possibility in Beam JDBC to use partitioning for reading a very large table with millions of rows in parallel (for example when migrating legacy database data to BigQuery).
I have some ideas which are decribed here in more detailes:
https://docs.google.com/document/d/1wBzVhQEhTK23ALzTSZ_CVouEOXTm3w2-LjmO3ieUvFc/edit?usp=sharing
I would like to start working on the related task I've created https://issues.apache.org/jira/browse/BEAM-12456
If anybody have any concerns or proposals please feel free to leave comments at google doc.

Thank you,
Daria