You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Raphael Sanamyan <ra...@akvelon.com> on 2021/06/09 19:34:44 UTC

Re: [EXTERNAL] Re:

Hello,

Here is a case where you need to have a statement and a preparedStatementSetter.

      PCollection<Row> dataCollection = pipeline.apply(Create.of(data));
      PCollection<Void> rowsWritten =
          dataCollection.apply(
              JdbcIO.<Row>write()
                  .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION)
                  .withBatchSize(10L)
                  .withTable(firstTableName)
                  .withResults());
      .dataCollection .
          .apply(Wait.on(rowsWritten))
          .apply(
              JdbcIO.<Row>write()
                  .withDataSourceConfiguration(DATA_SOURCE_CONFIGURATION)
                  .withBatchSize(10L)
                  .withTable(secondTableName));

      .run();

In this case, we write data to one table and then to the other, but only after the window of data has been fully written to the first table. It is not possible to do this with the existing JdbcIO.Write functionality.

Another option for this specific case could be extending the existing class instead of adding a schemaApi-specific class. We can add additional conditions and move some functionality from Write to WriteVoid to infer beamScheama. What do you think about these options?

Schema Providers is not very well documented in Beam, and a bit confusing us. We using Beam row as a common abstraction in Beam pipelines, which really meets our requirements. Looking to Beam docs/code we saw SchemaProviders for some IOs. Those providers seem like wrappers around IOs that help work with schemas and conversion data to Beam Rows. Сould you please clarify this a little? If we want to improve Beam Schema API what is the architecture-right way to do that?


Thank you,
Raphael.
________________________________
От: Brian Hulette <bh...@google.com>
Отправлено: 9 июня 2021 г. 19:12:41
Кому: dev
Копия: Reuven Lax; pabloem@google.com; Ilya Kozyrev
Тема: [EXTERNAL] Re:

> And also the ticket and "// TODO: BEAM-10396 use writeRows() when it's available" appeared later than this functionality was added to "JdbcIO.Write".

Note that this TODO has been moved around through a few refactors. It was initially added last summer [1].
You're right that JdbcIO.Write's statement generation functionality was added about a year before that [2]. It's possible that the author of [1] didn't realize [2] was done. Or maybe there's some reason why it doesn't work there?

+1 for Alexey's requests:
- Identify cases where statement generation in JdbcIO.Write is insufficient, if they exist (e.g. can we just use it where that TODO is [3]? If not what goes wrong?).
- Update documentation to avoid this confusion in the future.

Brian

[1] https://github.com/apache/beam/pull/12145
[2] https://github.com/apache/beam/pull/8962
[3] https://github.com/apache/beam/pull/14954#discussion_r648456230

On Wed, Jun 9, 2021 at 7:49 AM Alexey Romanenko <ar...@gmail.com>> wrote:
Hello Raphael,

On 9 Jun 2021, at 09:31, Raphael Sanamyan <ra...@akvelon.com>> wrote:

The "JdbcIO.Write" allows you to write rows without a statement or statement preparer, but not all functionality works without them.

Could you show a use case when the current functionality is not enough?


The method "WithResults" requires a statement and statement preparer. And also the ticket<https://issues.apache.org/jira/browse/BEAM-10396> and "// TODO: BEAM-10396 use writeRows() when it's available"<https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java#L142> appeared later than this functionality was added to "JdbcIO.Write". And without reading the code, just the documentation, it's not clear that the schema is enough.

Agree but the documentation can be updated. On the oath hand, it would be great to have some examples that show the needs of WriteRows.

Thanks,
Alexey

Thank you,
Raphael.




________________________________
От: Pablo Estrada <pa...@google.com>>
Отправлено: 7 июня 2021 г. 22:43:24
Кому: dev; Reuven Lax
Копия: Ilya Kozyrev
Тема: Re:

******* This Message Is From an External Sender *******
+Reuven Lax<ma...@google.com> do you know if this is already supported or not?
I have been able to use `JdbcIO.write()` without specifying a statement nor a statement preparer. Is that not what's necessary? I've done this with a named class with schemas (i.e. not Row) - is this perhaps the difference?
Best
-P.

On Fri, Jun 4, 2021 at 3:44 PM Robert Bradshaw <ro...@google.com>> wrote:
That would be great! I don't know much about this particular issue,
but tips for getting started in general can be found at
https://beam.apache.org/contribute/

On Thu, Jun 3, 2021 at 10:55 AM Raphael Sanamyan
<ra...@akvelon.com>> wrote:
>
> Hi, community,
>
> I would like to start work on this task  beam-10396, I hope nobody minds?
> Also, if anyone has any details or developments on this task, I would be glad if you could share them.
>
> Thank you,
> Raphael.
>
>


Re: [EXTERNAL] [EXTERNAL]

Posted by Raphael Sanamyan <ra...@akvelon.com>.
Hello!

I've made a PR[1] and created a task[2] in Jira. Could someone please review the PR?

Thanks,
Raphael

[1] https://github.com/akvelon/beam/pull/17<https://github.com/akvelon/beam/pull/17>
[2] https://issues.apache.org/jira/browse/BEAM-12511<https://issues.apache.org/jira/browse/BEAM-12511>


Re: [EXTERNAL] [EXTERNAL]

Posted by Alexey Romanenko <ar...@gmail.com>.

> On 15 Jun 2021, at 22:59, Raphael Sanamyan <ra...@akvelon.com> wrote:
> 
> Hello,
> 
>> Is it somehow related to this work [1]? 
> 
> 
> No, this work adds the ability to return values from a sql insert query. There are no improvements to work with row and schema in it.
> 
>> Not sure that I got it. Could you elaborate a bit on this? 
> 
> 
> When we using "Write" with table and without statement, "Write.expand" is called, which automatically generates statement and provides input to "WriteVoid.expand", but when we using "Write.withResults", only "WriteVoid.expand" is called, which can't automatically generate statement. If we add conditions there similar to those in "Write.Expand" and move the statement generation in "WriteVoid.expand", we'll fix this case

Yes, I think we can do it there in the same way as we do for "Write.expand()". 

> I analyzed the Write class again and it seems to be the only case where there is no full support for automatic work with "row". I think it makes sense to delete todo <https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java#L142> and close the the task <https://issues.apache.org/jira/browse/BEAM-10396>, to not confuse people. And create a task, to solve this case. What do you think about that?

I agree on this. 

Back to https://github.com/apache/beam/pull/14856/
It should kind of replace WriteVoid since it does the same job but also returns the results of write and I suggested to deprecate WriteVoid. So, we will need to add automatic statement generating there too.

—
Alexey

Re: [EXTERNAL] Re: [EXTERNAL]

Posted by Raphael Sanamyan <ra...@akvelon.com>.
Hello,

Is it somehow related to this work [1]?

No, this work adds the ability to return values from a sql insert query. There are no improvements to work with row and schema in it.

Not sure that I got it. Could you elaborate a bit on this?

When we using "Write" with table and without statement, "Write.expand" is called, which automatically generates statement and provides input to "WriteVoid.expand", but when we using "Write.withResults", only "WriteVoid.expand" is called, which can't automatically generate statement. If we add conditions there similar to those in "Write.Expand" and move the statement generation in "WriteVoid.expand", we'll fix this case

I analyzed the Write class again and it seems to be the only case where there is no full support for automatic work with "row". I think it makes sense to delete todo<https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java#L142> and close the the task<https://issues.apache.org/jira/browse/BEAM-10396>, to not confuse people. And create a task, to solve this case. What do you think about that?


Thanks,
Raphael.

Re: [EXTERNAL]

Posted by Alexey Romanenko <ar...@gmail.com>.
> On 9 Jun 2021, at 21:34, Raphael Sanamyan <ra...@akvelon.com> wrote:
> 
> In this case, we write data to one table and then to the other, but only after the window of data has been fully written to the first table. It is not possible to do this with the existing JdbcIO.Write functionality.

Well, it’s kind of possible but in this case we need to set a statement. I guess, it can be fixed by generating it automatically from input schema.

> Another option for this specific case could be extending the existing class instead of adding a schemaApi-specific class. We can add additional conditions and move some functionality from Write to WriteVoid to infer beamScheama. What do you think about these options?

Not sure that I got it. Could you elaborate a bit on this?  

Is it somehow related to this work [1]? 

> Schema Providers is not very well documented in Beam, and a bit confusing us. We using Beam row as a common abstraction in Beam pipelines, which really meets our requirements. Looking to Beam docs/code we saw SchemaProviders for some IOs. Those providers seem like wrappers around IOs that help work with schemas and conversion data to Beam Rows. Сould you please clarify this a little? If we want to improve Beam Schema API what is the architecture-right way to do that?

Well, it depends what do you want improve - Schema API in general or some specific IO schema related things. We need to be careful with breaking changes. Anyway, it would be great to bring it to this mailing list as a design doc in some way and discuss with other people before starting an implementation.

—
Alexey


[1] https://github.com/apache/beam/pull/14856



> 
> Thank you,
> Raphael.
> От: Brian Hulette <bh...@google.com>
> Отправлено: 9 июня 2021 г. 19:12:41
> Кому: dev
> Копия: Reuven Lax; pabloem@google.com; Ilya Kozyrev
> Тема: [EXTERNAL] Re:
>  
> > And also the ticket and "// TODO: BEAM-10396 use writeRows() when it's available" appeared later than this functionality was added to "JdbcIO.Write".
> 
> Note that this TODO has been moved around through a few refactors. It was initially added last summer [1].
> You're right that JdbcIO.Write's statement generation functionality was added about a year before that [2]. It's possible that the author of [1] didn't realize [2] was done. Or maybe there's some reason why it doesn't work there?
> 
> +1 for Alexey's requests:
> - Identify cases where statement generation in JdbcIO.Write is insufficient, if they exist (e.g. can we just use it where that TODO is [3]? If not what goes wrong?).
> - Update documentation to avoid this confusion in the future.
> 
> Brian
> 
> [1] https://github.com/apache/beam/pull/12145 <https://github.com/apache/beam/pull/12145>
> [2] https://github.com/apache/beam/pull/8962 <https://github.com/apache/beam/pull/8962>
> [3] https://github.com/apache/beam/pull/14954#discussion_r648456230 <https://github.com/apache/beam/pull/14954#discussion_r648456230>
> On Wed, Jun 9, 2021 at 7:49 AM Alexey Romanenko <aromanenko.dev@gmail.com <ma...@gmail.com>> wrote:
> Hello Raphael,
> 
>> On 9 Jun 2021, at 09:31, Raphael Sanamyan <raphael.sanamyan@akvelon.com <ma...@akvelon.com>> wrote:
>> 
>> The "JdbcIO.Write" allows you to write rows without a statement or statement preparer, but not all functionality works without them.
> 
> Could you show a use case when the current functionality is not enough? 
> 
> 
>> The method "WithResults" requires a statement and statement preparer. And also the ticket <https://issues.apache.org/jira/browse/BEAM-10396> and "// TODO: BEAM-10396 use writeRows() when it's available" <https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java#L142> appeared later than this functionality was added to "JdbcIO.Write". And without reading the code, just the documentation, it's not clear that the schema is enough.
> 
> Agree but the documentation can be updated. On the oath hand, it would be great to have some examples that show the needs of WriteRows.
> 
> Thanks,
> Alexey
> 
>> Thank you,
>> Raphael.
>> 
>> 
>> 
>> От: Pablo Estrada <pabloem@google.com <ma...@google.com>>
>> Отправлено: 7 июня 2021 г. 22:43:24
>> Кому: dev; Reuven Lax
>> Копия: Ilya Kozyrev
>> Тема: Re:
>>  
>> ******* This Message Is From an External Sender *******
>> +Reuven Lax <ma...@google.com> do you know if this is already supported or not?
>> I have been able to use `JdbcIO.write()` without specifying a statement nor a statement preparer. Is that not what's necessary? I've done this with a named class with schemas (i.e. not Row) - is this perhaps the difference?
>> Best
>> -P.
>> 
>> On Fri, Jun 4, 2021 at 3:44 PM Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>> That would be great! I don't know much about this particular issue,
>> but tips for getting started in general can be found at
>> https://beam.apache.org/contribute/ <https://beam.apache.org/contribute/>
>> 
>> On Thu, Jun 3, 2021 at 10:55 AM Raphael Sanamyan
>> <raphael.sanamyan@akvelon.com <ma...@akvelon.com>> wrote:
>> >
>> > Hi, community,
>> >
>> > I would like to start work on this task  beam-10396, I hope nobody minds?
>> > Also, if anyone has any details or developments on this task, I would be glad if you could share them.
>> >
>> > Thank you,
>> > Raphael.
>> >
>> >