You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Dylan Forciea <dy...@oseberg.io> on 2020/10/07 13:52:33 UTC

Re: autoCommit for postgres jdbc streaming in Table/SQL API

I hadn’t heard a response on this, so I’m going to expand this to the dev email list.

If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.

Regards,
Dylan Forciea
Oseberg

From: Dylan Forciea <dy...@oseberg.io>
Date: Thursday, October 1, 2020 at 5:14 PM
To: "user@flink.apache.org" <us...@flink.apache.org>
Subject: autoCommit for postgres jdbc streaming in Table/SQL API

Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.

I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down.

I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.

Am I missing something on how to pull this off?

Regards,
Dylan Forciea
Oseberg

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Jark Wu <im...@gmail.com>.
Thanks for the explanation. I will review the pull request. Let's move the
discussion to the PR.

Best,
Jark

On Fri, 9 Oct 2020 at 21:06, Dylan Forciea <dy...@oseberg.io> wrote:

> Jark,
>
>
>
> Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be
> set to false for streaming to work. The default on the driver is true when
> the option isn’t specified. I’ve updated the issue accordingly.
>
>
>
> Setting this to false automatically on the read path would fix my issue.
> However, I’m only certain that this is proper for Postgres. I’m not sure if
> this should be done for other drivers, although my gut would say it should
> be fine if it’s only done for reading. My patch as it is will set the
> builder to not specify whether to set autoCommit if the option is not
> specified, which means it would then be left at the default of true. That
> would conflict with the 1.11 patch you suggested. Let me know if you think
> I should make the default true in the SQL API.
>
>
>
> https://github.com/apache/flink/pull/13570
>
>
>
> Regards,
>
> Dylan
>
>
>
> *From: *Jark Wu <im...@gmail.com>
> *Date: *Thursday, October 8, 2020 at 10:15 PM
> *To: *Dylan Forciea <dy...@oseberg.io>
> *Cc: *Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>,
> Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, Leonard Xu <xb...@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Hi Dylan,
>
>
>
> Sorry for the late reply. We've just come back from a long holiday.
>
>
>
> Thanks for reporting this problem. First, I think this is a bug that
> `autoCommit` is false by default (JdbcRowDataInputFormat.Builder).
>
> We can fix the default to true in 1.11 series, and I think this can solve
> your problem in a short term?
>
> Besides, we should expose the connector options to set auto commit and
> this can be another issue to be implemented in master.
>
> I'm glad to review the code.
>
>
>
> What do you think?
>
>
>
> Regarding to the failed JMXReporterFactoryTest, I think this is a known
> issue, see FLINK-19539 [1]
>
>
>
> Best,
>
> Jark
>
>
>
> [1]: https://issues.apache.org/jira/browse/FLINK-19539
>
>
>
> On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <dy...@oseberg.io> wrote:
>
> I’ve updated the unit tests and documentation, and I was running the azure
> test pipeline as described in the instructions. However, it appears that
> what seems to be an unrelated test for the JMX code failed. Is this a
> matter of me not setting things up correctly? I wanted to ensure everything
> looked good before I submitted the PR.
>
>
>
> [ERROR] Failures:
>
> [ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
>
> Expected: (a value equal to or greater than <9000> and a value less than
> or equal to <9010>)
>
>      but: a value less than or equal to <9010> <9040> was greater than
> <9010>
>
> [ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
>
> [INFO]
>
> [ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0
>
>
>
> Thanks,
>
> Dylan Forciea
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Thursday, October 8, 2020 at 2:29 AM
> *To: *Dylan Forciea <dy...@oseberg.io>
> *Cc: *dev <de...@flink.apache.org>, Shengkai Fang <fs...@gmail.com>, "
> user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <
> jark@apache.org>, Leonard Xu <xb...@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> This sounds good. Maybe there are others in the community who can help
> with the review before the Jark and Leonard are back.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> Actually…. It looks like what I did covers both cases. I’ll see about
> getting some unit tests and documentation updated.
>
>
>
> Dylan
>
>
>
> *From: *Dylan Forciea <dy...@oseberg.io>
> *Date: *Wednesday, October 7, 2020 at 11:47 AM
> *To: *Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Ok, I have created FLINK-19522 describing the issue. I have the code I
> made so far checked in at
> https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but
> this only fixes the SQL API. It sounds like there may be another change
> needed for the Table API… I’ll look into that and see if I can figure it
> out on my own while they’re out. I will also need to add some unit tests
> and update some documentation to get this ready for a PR.
>
>
>
> Thanks,
>
> Dylan
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Wednesday, October 7, 2020 at 10:55 AM
> *To: *dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Hi Dylan,
>
>
>
> thanks for reaching out to the Flink community and excuse our late
> response. I am not an expert for the Table API and its JDBC connector but
> what you describe sounds like a missing feature. Also given that
> FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we
> might simply need to make it configurable from the JdbcTableSource. I am
> pulling in Jark and Leonard who worked on the JdbcTableSource and might
> help you to get this feature into Flink. Their response could take a week
> because they are currently on vacation if I am not mistaken.
>
>
>
> What you could already do is to open an issue linking FLINK-12198 and
> describing the problem and your solution proposal.
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-12198
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> I appreciate it! Let me know if you want me to submit a PR against the
> issue after it is created. It wasn’t a huge amount of code, so it’s
> probably not a big deal if you wanted to redo it.
>
> Thanks,
> Dylan
>
> From: Shengkai Fang <fs...@gmail.com>
> Date: Wednesday, October 7, 2020 at 9:06 AM
> To: Dylan Forciea <dy...@oseberg.io>
> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
> Sorry for late response. +1 to support it. I will open a jira about it
> later.
>
> Dylan Forciea <dy...@oseberg.io>>于2020年10月7日 周三下午
> 9:53写道:
>
>
>
>
>
>
>
>
>
>
>
>
>
> I hadn’t heard a response on this, so I’m going to expand this to the dev
> email list.
>
>
>
> If this is indeed an issue and not my misunderstanding, I have most of a
> patch already coded up. Please let me know, and I can create a JIRA issue
> and send out a PR.
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
> From: Dylan Forciea <dy...@oseberg.io>>
>
>
> Date: Thursday, October 1, 2020 at 5:14 PM
>
>
> To: "user@flink.apache.org<ma...@flink.apache.org>" <
> user@flink.apache.org<ma...@flink.apache.org>>
>
>
> Subject: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
>
>
>
> Hi! I’ve just recently started evaluating Flink for our ETL needs, and I
> ran across an issue with streaming postgres data via the Table/SQL API.
>
>
>
> I see that the API has the scan.fetch-size option, but not
> scan.auto-commit per
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
> . I had attempted to load a large table in, but it completely slurped it
> into memory before starting the streaming. I modified the flink source code
> to add a scan.auto-commit
>
> option, and I was then able to immediately start streaming and cut my
> memory usage way down.
>
>
>
> I see in this thread that there was a similar issue resolved for
> JDBCInputFormat in this thread:
>
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
> , but I don’t see a way to utilize that in the Table/SQL API.
>
>
>
> Am I missing something on how to pull this off?
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Jark Wu <im...@gmail.com>.
Thanks for the explanation. I will review the pull request. Let's move the
discussion to the PR.

Best,
Jark

On Fri, 9 Oct 2020 at 21:06, Dylan Forciea <dy...@oseberg.io> wrote:

> Jark,
>
>
>
> Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be
> set to false for streaming to work. The default on the driver is true when
> the option isn’t specified. I’ve updated the issue accordingly.
>
>
>
> Setting this to false automatically on the read path would fix my issue.
> However, I’m only certain that this is proper for Postgres. I’m not sure if
> this should be done for other drivers, although my gut would say it should
> be fine if it’s only done for reading. My patch as it is will set the
> builder to not specify whether to set autoCommit if the option is not
> specified, which means it would then be left at the default of true. That
> would conflict with the 1.11 patch you suggested. Let me know if you think
> I should make the default true in the SQL API.
>
>
>
> https://github.com/apache/flink/pull/13570
>
>
>
> Regards,
>
> Dylan
>
>
>
> *From: *Jark Wu <im...@gmail.com>
> *Date: *Thursday, October 8, 2020 at 10:15 PM
> *To: *Dylan Forciea <dy...@oseberg.io>
> *Cc: *Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>,
> Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, Leonard Xu <xb...@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Hi Dylan,
>
>
>
> Sorry for the late reply. We've just come back from a long holiday.
>
>
>
> Thanks for reporting this problem. First, I think this is a bug that
> `autoCommit` is false by default (JdbcRowDataInputFormat.Builder).
>
> We can fix the default to true in 1.11 series, and I think this can solve
> your problem in a short term?
>
> Besides, we should expose the connector options to set auto commit and
> this can be another issue to be implemented in master.
>
> I'm glad to review the code.
>
>
>
> What do you think?
>
>
>
> Regarding to the failed JMXReporterFactoryTest, I think this is a known
> issue, see FLINK-19539 [1]
>
>
>
> Best,
>
> Jark
>
>
>
> [1]: https://issues.apache.org/jira/browse/FLINK-19539
>
>
>
> On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <dy...@oseberg.io> wrote:
>
> I’ve updated the unit tests and documentation, and I was running the azure
> test pipeline as described in the instructions. However, it appears that
> what seems to be an unrelated test for the JMX code failed. Is this a
> matter of me not setting things up correctly? I wanted to ensure everything
> looked good before I submitted the PR.
>
>
>
> [ERROR] Failures:
>
> [ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
>
> Expected: (a value equal to or greater than <9000> and a value less than
> or equal to <9010>)
>
>      but: a value less than or equal to <9010> <9040> was greater than
> <9010>
>
> [ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
>
> [INFO]
>
> [ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0
>
>
>
> Thanks,
>
> Dylan Forciea
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Thursday, October 8, 2020 at 2:29 AM
> *To: *Dylan Forciea <dy...@oseberg.io>
> *Cc: *dev <de...@flink.apache.org>, Shengkai Fang <fs...@gmail.com>, "
> user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <
> jark@apache.org>, Leonard Xu <xb...@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> This sounds good. Maybe there are others in the community who can help
> with the review before the Jark and Leonard are back.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> Actually…. It looks like what I did covers both cases. I’ll see about
> getting some unit tests and documentation updated.
>
>
>
> Dylan
>
>
>
> *From: *Dylan Forciea <dy...@oseberg.io>
> *Date: *Wednesday, October 7, 2020 at 11:47 AM
> *To: *Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Ok, I have created FLINK-19522 describing the issue. I have the code I
> made so far checked in at
> https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but
> this only fixes the SQL API. It sounds like there may be another change
> needed for the Table API… I’ll look into that and see if I can figure it
> out on my own while they’re out. I will also need to add some unit tests
> and update some documentation to get this ready for a PR.
>
>
>
> Thanks,
>
> Dylan
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Wednesday, October 7, 2020 at 10:55 AM
> *To: *dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Hi Dylan,
>
>
>
> thanks for reaching out to the Flink community and excuse our late
> response. I am not an expert for the Table API and its JDBC connector but
> what you describe sounds like a missing feature. Also given that
> FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we
> might simply need to make it configurable from the JdbcTableSource. I am
> pulling in Jark and Leonard who worked on the JdbcTableSource and might
> help you to get this feature into Flink. Their response could take a week
> because they are currently on vacation if I am not mistaken.
>
>
>
> What you could already do is to open an issue linking FLINK-12198 and
> describing the problem and your solution proposal.
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-12198
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> I appreciate it! Let me know if you want me to submit a PR against the
> issue after it is created. It wasn’t a huge amount of code, so it’s
> probably not a big deal if you wanted to redo it.
>
> Thanks,
> Dylan
>
> From: Shengkai Fang <fs...@gmail.com>
> Date: Wednesday, October 7, 2020 at 9:06 AM
> To: Dylan Forciea <dy...@oseberg.io>
> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
> Sorry for late response. +1 to support it. I will open a jira about it
> later.
>
> Dylan Forciea <dy...@oseberg.io>>于2020年10月7日 周三下午
> 9:53写道:
>
>
>
>
>
>
>
>
>
>
>
>
>
> I hadn’t heard a response on this, so I’m going to expand this to the dev
> email list.
>
>
>
> If this is indeed an issue and not my misunderstanding, I have most of a
> patch already coded up. Please let me know, and I can create a JIRA issue
> and send out a PR.
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
> From: Dylan Forciea <dy...@oseberg.io>>
>
>
> Date: Thursday, October 1, 2020 at 5:14 PM
>
>
> To: "user@flink.apache.org<ma...@flink.apache.org>" <
> user@flink.apache.org<ma...@flink.apache.org>>
>
>
> Subject: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
>
>
>
> Hi! I’ve just recently started evaluating Flink for our ETL needs, and I
> ran across an issue with streaming postgres data via the Table/SQL API.
>
>
>
> I see that the API has the scan.fetch-size option, but not
> scan.auto-commit per
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
> . I had attempted to load a large table in, but it completely slurped it
> into memory before starting the streaming. I modified the flink source code
> to add a scan.auto-commit
>
> option, and I was then able to immediately start streaming and cut my
> memory usage way down.
>
>
>
> I see in this thread that there was a similar issue resolved for
> JDBCInputFormat in this thread:
>
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
> , but I don’t see a way to utilize that in the Table/SQL API.
>
>
>
> Am I missing something on how to pull this off?
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
Jark,

Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be set to false for streaming to work. The default on the driver is true when the option isn’t specified. I’ve updated the issue accordingly.

Setting this to false automatically on the read path would fix my issue. However, I’m only certain that this is proper for Postgres. I’m not sure if this should be done for other drivers, although my gut would say it should be fine if it’s only done for reading. My patch as it is will set the builder to not specify whether to set autoCommit if the option is not specified, which means it would then be left at the default of true. That would conflict with the 1.11 patch you suggested. Let me know if you think I should make the default true in the SQL API.

https://github.com/apache/flink/pull/13570

Regards,
Dylan

From: Jark Wu <im...@gmail.com>
Date: Thursday, October 8, 2020 at 10:15 PM
To: Dylan Forciea <dy...@oseberg.io>
Cc: Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>, Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

Sorry for the late reply. We've just come back from a long holiday.

Thanks for reporting this problem. First, I think this is a bug that `autoCommit` is false by default (JdbcRowDataInputFormat.Builder).
We can fix the default to true in 1.11 series, and I think this can solve your problem in a short term?
Besides, we should expose the connector options to set auto commit and this can be another issue to be implemented in master.
I'm glad to review the code.

What do you think?

Regarding to the failed JMXReporterFactoryTest, I think this is a known issue, see FLINK-19539 [1]

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-19539

On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <dy...@oseberg.io>> wrote:
I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.

[ERROR] Failures:
[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>)
     but: a value less than or equal to <9010> <9040> was greater than <9010>
[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
[INFO]
[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

Thanks,
Dylan Forciea

From: Till Rohrmann <tr...@apache.org>>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <dy...@oseberg.io>>
Cc: dev <de...@flink.apache.org>>, Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

Cheers,
Till

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io>> wrote:
Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

Dylan

From: Dylan Forciea <dy...@oseberg.io>>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <tr...@apache.org>>, dev <de...@flink.apache.org>>
Cc: Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <tr...@apache.org>>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <de...@flink.apache.org>>
Cc: Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>>" <us...@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg




Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
Jark,

Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be set to false for streaming to work. The default on the driver is true when the option isn’t specified. I’ve updated the issue accordingly.

Setting this to false automatically on the read path would fix my issue. However, I’m only certain that this is proper for Postgres. I’m not sure if this should be done for other drivers, although my gut would say it should be fine if it’s only done for reading. My patch as it is will set the builder to not specify whether to set autoCommit if the option is not specified, which means it would then be left at the default of true. That would conflict with the 1.11 patch you suggested. Let me know if you think I should make the default true in the SQL API.

https://github.com/apache/flink/pull/13570

Regards,
Dylan

From: Jark Wu <im...@gmail.com>
Date: Thursday, October 8, 2020 at 10:15 PM
To: Dylan Forciea <dy...@oseberg.io>
Cc: Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>, Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

Sorry for the late reply. We've just come back from a long holiday.

Thanks for reporting this problem. First, I think this is a bug that `autoCommit` is false by default (JdbcRowDataInputFormat.Builder).
We can fix the default to true in 1.11 series, and I think this can solve your problem in a short term?
Besides, we should expose the connector options to set auto commit and this can be another issue to be implemented in master.
I'm glad to review the code.

What do you think?

Regarding to the failed JMXReporterFactoryTest, I think this is a known issue, see FLINK-19539 [1]

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-19539

On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <dy...@oseberg.io>> wrote:
I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.

[ERROR] Failures:
[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>)
     but: a value less than or equal to <9010> <9040> was greater than <9010>
[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
[INFO]
[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

Thanks,
Dylan Forciea

From: Till Rohrmann <tr...@apache.org>>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <dy...@oseberg.io>>
Cc: dev <de...@flink.apache.org>>, Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

Cheers,
Till

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io>> wrote:
Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

Dylan

From: Dylan Forciea <dy...@oseberg.io>>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <tr...@apache.org>>, dev <de...@flink.apache.org>>
Cc: Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <tr...@apache.org>>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <de...@flink.apache.org>>
Cc: Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>>" <us...@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg




Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Jark Wu <im...@gmail.com>.
Hi Dylan,

Sorry for the late reply. We've just come back from a long holiday.

Thanks for reporting this problem. First, I think this is a bug that
`autoCommit` is false by default (JdbcRowDataInputFormat.Builder).
We can fix the default to true in 1.11 series, and I think this can solve
your problem in a short term?
Besides, we should expose the connector options to set auto commit and this
can be another issue to be implemented in master.
I'm glad to review the code.

What do you think?

Regarding to the failed JMXReporterFactoryTest, I think this is a known
issue, see FLINK-19539 [1]

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-19539

On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <dy...@oseberg.io> wrote:

> I’ve updated the unit tests and documentation, and I was running the azure
> test pipeline as described in the instructions. However, it appears that
> what seems to be an unrelated test for the JMX code failed. Is this a
> matter of me not setting things up correctly? I wanted to ensure everything
> looked good before I submitted the PR.
>
>
>
> [ERROR] Failures:
>
> [ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
>
> Expected: (a value equal to or greater than <9000> and a value less than
> or equal to <9010>)
>
>      but: a value less than or equal to <9010> <9040> was greater than
> <9010>
>
> [ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
>
> [INFO]
>
> [ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0
>
>
>
> Thanks,
>
> Dylan Forciea
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Thursday, October 8, 2020 at 2:29 AM
> *To: *Dylan Forciea <dy...@oseberg.io>
> *Cc: *dev <de...@flink.apache.org>, Shengkai Fang <fs...@gmail.com>, "
> user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <
> jark@apache.org>, Leonard Xu <xb...@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> This sounds good. Maybe there are others in the community who can help
> with the review before the Jark and Leonard are back.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> Actually…. It looks like what I did covers both cases. I’ll see about
> getting some unit tests and documentation updated.
>
>
>
> Dylan
>
>
>
> *From: *Dylan Forciea <dy...@oseberg.io>
> *Date: *Wednesday, October 7, 2020 at 11:47 AM
> *To: *Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Ok, I have created FLINK-19522 describing the issue. I have the code I
> made so far checked in at
> https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but
> this only fixes the SQL API. It sounds like there may be another change
> needed for the Table API… I’ll look into that and see if I can figure it
> out on my own while they’re out. I will also need to add some unit tests
> and update some documentation to get this ready for a PR.
>
>
>
> Thanks,
>
> Dylan
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Wednesday, October 7, 2020 at 10:55 AM
> *To: *dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Hi Dylan,
>
>
>
> thanks for reaching out to the Flink community and excuse our late
> response. I am not an expert for the Table API and its JDBC connector but
> what you describe sounds like a missing feature. Also given that
> FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we
> might simply need to make it configurable from the JdbcTableSource. I am
> pulling in Jark and Leonard who worked on the JdbcTableSource and might
> help you to get this feature into Flink. Their response could take a week
> because they are currently on vacation if I am not mistaken.
>
>
>
> What you could already do is to open an issue linking FLINK-12198 and
> describing the problem and your solution proposal.
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-12198
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> I appreciate it! Let me know if you want me to submit a PR against the
> issue after it is created. It wasn’t a huge amount of code, so it’s
> probably not a big deal if you wanted to redo it.
>
> Thanks,
> Dylan
>
> From: Shengkai Fang <fs...@gmail.com>
> Date: Wednesday, October 7, 2020 at 9:06 AM
> To: Dylan Forciea <dy...@oseberg.io>
> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
> Sorry for late response. +1 to support it. I will open a jira about it
> later.
>
> Dylan Forciea <dy...@oseberg.io>>于2020年10月7日 周三下午
> 9:53写道:
>
>
>
>
>
>
>
>
>
>
>
>
>
> I hadn’t heard a response on this, so I’m going to expand this to the dev
> email list.
>
>
>
> If this is indeed an issue and not my misunderstanding, I have most of a
> patch already coded up. Please let me know, and I can create a JIRA issue
> and send out a PR.
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
> From: Dylan Forciea <dy...@oseberg.io>>
>
>
> Date: Thursday, October 1, 2020 at 5:14 PM
>
>
> To: "user@flink.apache.org<ma...@flink.apache.org>" <
> user@flink.apache.org<ma...@flink.apache.org>>
>
>
> Subject: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
>
>
>
> Hi! I’ve just recently started evaluating Flink for our ETL needs, and I
> ran across an issue with streaming postgres data via the Table/SQL API.
>
>
>
> I see that the API has the scan.fetch-size option, but not
> scan.auto-commit per
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
> . I had attempted to load a large table in, but it completely slurped it
> into memory before starting the streaming. I modified the flink source code
> to add a scan.auto-commit
>
> option, and I was then able to immediately start streaming and cut my
> memory usage way down.
>
>
>
> I see in this thread that there was a similar issue resolved for
> JDBCInputFormat in this thread:
>
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
> , but I don’t see a way to utilize that in the Table/SQL API.
>
>
>
> Am I missing something on how to pull this off?
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
>

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Jark Wu <im...@gmail.com>.
Hi Dylan,

Sorry for the late reply. We've just come back from a long holiday.

Thanks for reporting this problem. First, I think this is a bug that
`autoCommit` is false by default (JdbcRowDataInputFormat.Builder).
We can fix the default to true in 1.11 series, and I think this can solve
your problem in a short term?
Besides, we should expose the connector options to set auto commit and this
can be another issue to be implemented in master.
I'm glad to review the code.

What do you think?

Regarding to the failed JMXReporterFactoryTest, I think this is a known
issue, see FLINK-19539 [1]

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-19539

On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <dy...@oseberg.io> wrote:

> I’ve updated the unit tests and documentation, and I was running the azure
> test pipeline as described in the instructions. However, it appears that
> what seems to be an unrelated test for the JMX code failed. Is this a
> matter of me not setting things up correctly? I wanted to ensure everything
> looked good before I submitted the PR.
>
>
>
> [ERROR] Failures:
>
> [ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
>
> Expected: (a value equal to or greater than <9000> and a value less than
> or equal to <9010>)
>
>      but: a value less than or equal to <9010> <9040> was greater than
> <9010>
>
> [ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
>
> [INFO]
>
> [ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0
>
>
>
> Thanks,
>
> Dylan Forciea
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Thursday, October 8, 2020 at 2:29 AM
> *To: *Dylan Forciea <dy...@oseberg.io>
> *Cc: *dev <de...@flink.apache.org>, Shengkai Fang <fs...@gmail.com>, "
> user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <
> jark@apache.org>, Leonard Xu <xb...@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> This sounds good. Maybe there are others in the community who can help
> with the review before the Jark and Leonard are back.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> Actually…. It looks like what I did covers both cases. I’ll see about
> getting some unit tests and documentation updated.
>
>
>
> Dylan
>
>
>
> *From: *Dylan Forciea <dy...@oseberg.io>
> *Date: *Wednesday, October 7, 2020 at 11:47 AM
> *To: *Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Ok, I have created FLINK-19522 describing the issue. I have the code I
> made so far checked in at
> https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but
> this only fixes the SQL API. It sounds like there may be another change
> needed for the Table API… I’ll look into that and see if I can figure it
> out on my own while they’re out. I will also need to add some unit tests
> and update some documentation to get this ready for a PR.
>
>
>
> Thanks,
>
> Dylan
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Wednesday, October 7, 2020 at 10:55 AM
> *To: *dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Hi Dylan,
>
>
>
> thanks for reaching out to the Flink community and excuse our late
> response. I am not an expert for the Table API and its JDBC connector but
> what you describe sounds like a missing feature. Also given that
> FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we
> might simply need to make it configurable from the JdbcTableSource. I am
> pulling in Jark and Leonard who worked on the JdbcTableSource and might
> help you to get this feature into Flink. Their response could take a week
> because they are currently on vacation if I am not mistaken.
>
>
>
> What you could already do is to open an issue linking FLINK-12198 and
> describing the problem and your solution proposal.
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-12198
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> I appreciate it! Let me know if you want me to submit a PR against the
> issue after it is created. It wasn’t a huge amount of code, so it’s
> probably not a big deal if you wanted to redo it.
>
> Thanks,
> Dylan
>
> From: Shengkai Fang <fs...@gmail.com>
> Date: Wednesday, October 7, 2020 at 9:06 AM
> To: Dylan Forciea <dy...@oseberg.io>
> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
> Sorry for late response. +1 to support it. I will open a jira about it
> later.
>
> Dylan Forciea <dy...@oseberg.io>>于2020年10月7日 周三下午
> 9:53写道:
>
>
>
>
>
>
>
>
>
>
>
>
>
> I hadn’t heard a response on this, so I’m going to expand this to the dev
> email list.
>
>
>
> If this is indeed an issue and not my misunderstanding, I have most of a
> patch already coded up. Please let me know, and I can create a JIRA issue
> and send out a PR.
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
> From: Dylan Forciea <dy...@oseberg.io>>
>
>
> Date: Thursday, October 1, 2020 at 5:14 PM
>
>
> To: "user@flink.apache.org<ma...@flink.apache.org>" <
> user@flink.apache.org<ma...@flink.apache.org>>
>
>
> Subject: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
>
>
>
> Hi! I’ve just recently started evaluating Flink for our ETL needs, and I
> ran across an issue with streaming postgres data via the Table/SQL API.
>
>
>
> I see that the API has the scan.fetch-size option, but not
> scan.auto-commit per
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
> . I had attempted to load a large table in, but it completely slurped it
> into memory before starting the streaming. I modified the flink source code
> to add a scan.auto-commit
>
> option, and I was then able to immediately start streaming and cut my
> memory usage way down.
>
>
>
> I see in this thread that there was a similar issue resolved for
> JDBCInputFormat in this thread:
>
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
> , but I don’t see a way to utilize that in the Table/SQL API.
>
>
>
> Am I missing something on how to pull this off?
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
>

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.

[ERROR] Failures:
[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>)
     but: a value less than or equal to <9010> <9040> was greater than <9010>
[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
[INFO]
[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

Thanks,
Dylan Forciea

From: Till Rohrmann <tr...@apache.org>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <dy...@oseberg.io>
Cc: dev <de...@flink.apache.org>, Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

Cheers,
Till

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io>> wrote:
Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

Dylan

From: Dylan Forciea <dy...@oseberg.io>>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <tr...@apache.org>>, dev <de...@flink.apache.org>>
Cc: Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <tr...@apache.org>>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <de...@flink.apache.org>>
Cc: Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>>" <us...@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg





Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.

[ERROR] Failures:
[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>)
     but: a value less than or equal to <9010> <9040> was greater than <9010>
[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
[INFO]
[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

Thanks,
Dylan Forciea

From: Till Rohrmann <tr...@apache.org>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <dy...@oseberg.io>
Cc: dev <de...@flink.apache.org>, Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

Cheers,
Till

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io>> wrote:
Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

Dylan

From: Dylan Forciea <dy...@oseberg.io>>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <tr...@apache.org>>, dev <de...@flink.apache.org>>
Cc: Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <tr...@apache.org>>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <de...@flink.apache.org>>
Cc: Shengkai Fang <fs...@gmail.com>>, "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>, "jark@apache.org<ma...@apache.org>" <ja...@apache.org>>, Leonard Xu <xb...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>>" <us...@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg





Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Till Rohrmann <tr...@apache.org>.
This sounds good. Maybe there are others in the community who can help with
the review before the Jark and Leonard are back.

Cheers,
Till

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io> wrote:

> Actually…. It looks like what I did covers both cases. I’ll see about
> getting some unit tests and documentation updated.
>
>
>
> Dylan
>
>
>
> *From: *Dylan Forciea <dy...@oseberg.io>
> *Date: *Wednesday, October 7, 2020 at 11:47 AM
> *To: *Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Ok, I have created FLINK-19522 describing the issue. I have the code I
> made so far checked in at
> https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but
> this only fixes the SQL API. It sounds like there may be another change
> needed for the Table API… I’ll look into that and see if I can figure it
> out on my own while they’re out. I will also need to add some unit tests
> and update some documentation to get this ready for a PR.
>
>
>
> Thanks,
>
> Dylan
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Wednesday, October 7, 2020 at 10:55 AM
> *To: *dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Hi Dylan,
>
>
>
> thanks for reaching out to the Flink community and excuse our late
> response. I am not an expert for the Table API and its JDBC connector but
> what you describe sounds like a missing feature. Also given that
> FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we
> might simply need to make it configurable from the JdbcTableSource. I am
> pulling in Jark and Leonard who worked on the JdbcTableSource and might
> help you to get this feature into Flink. Their response could take a week
> because they are currently on vacation if I am not mistaken.
>
>
>
> What you could already do is to open an issue linking FLINK-12198 and
> describing the problem and your solution proposal.
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-12198
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> I appreciate it! Let me know if you want me to submit a PR against the
> issue after it is created. It wasn’t a huge amount of code, so it’s
> probably not a big deal if you wanted to redo it.
>
> Thanks,
> Dylan
>
> From: Shengkai Fang <fs...@gmail.com>
> Date: Wednesday, October 7, 2020 at 9:06 AM
> To: Dylan Forciea <dy...@oseberg.io>
> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
> Sorry for late response. +1 to support it. I will open a jira about it
> later.
>
> Dylan Forciea <dy...@oseberg.io>>于2020年10月7日 周三下午
> 9:53写道:
>
>
>
>
>
>
>
>
>
>
>
>
>
> I hadn’t heard a response on this, so I’m going to expand this to the dev
> email list.
>
>
>
> If this is indeed an issue and not my misunderstanding, I have most of a
> patch already coded up. Please let me know, and I can create a JIRA issue
> and send out a PR.
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
> From: Dylan Forciea <dy...@oseberg.io>>
>
>
> Date: Thursday, October 1, 2020 at 5:14 PM
>
>
> To: "user@flink.apache.org<ma...@flink.apache.org>" <
> user@flink.apache.org<ma...@flink.apache.org>>
>
>
> Subject: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
>
>
>
> Hi! I’ve just recently started evaluating Flink for our ETL needs, and I
> ran across an issue with streaming postgres data via the Table/SQL API.
>
>
>
> I see that the API has the scan.fetch-size option, but not
> scan.auto-commit per
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
> . I had attempted to load a large table in, but it completely slurped it
> into memory before starting the streaming. I modified the flink source code
> to add a scan.auto-commit
>
> option, and I was then able to immediately start streaming and cut my
> memory usage way down.
>
>
>
> I see in this thread that there was a similar issue resolved for
> JDBCInputFormat in this thread:
>
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
> , but I don’t see a way to utilize that in the Table/SQL API.
>
>
>
> Am I missing something on how to pull this off?
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
>
>

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Till Rohrmann <tr...@apache.org>.
This sounds good. Maybe there are others in the community who can help with
the review before the Jark and Leonard are back.

Cheers,
Till

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <dy...@oseberg.io> wrote:

> Actually…. It looks like what I did covers both cases. I’ll see about
> getting some unit tests and documentation updated.
>
>
>
> Dylan
>
>
>
> *From: *Dylan Forciea <dy...@oseberg.io>
> *Date: *Wednesday, October 7, 2020 at 11:47 AM
> *To: *Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Ok, I have created FLINK-19522 describing the issue. I have the code I
> made so far checked in at
> https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but
> this only fixes the SQL API. It sounds like there may be another change
> needed for the Table API… I’ll look into that and see if I can figure it
> out on my own while they’re out. I will also need to add some unit tests
> and update some documentation to get this ready for a PR.
>
>
>
> Thanks,
>
> Dylan
>
>
>
> *From: *Till Rohrmann <tr...@apache.org>
> *Date: *Wednesday, October 7, 2020 at 10:55 AM
> *To: *dev <de...@flink.apache.org>
> *Cc: *Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <
> user@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <
> xbjtdcq@gmail.com>
> *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
> Hi Dylan,
>
>
>
> thanks for reaching out to the Flink community and excuse our late
> response. I am not an expert for the Table API and its JDBC connector but
> what you describe sounds like a missing feature. Also given that
> FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we
> might simply need to make it configurable from the JdbcTableSource. I am
> pulling in Jark and Leonard who worked on the JdbcTableSource and might
> help you to get this feature into Flink. Their response could take a week
> because they are currently on vacation if I am not mistaken.
>
>
>
> What you could already do is to open an issue linking FLINK-12198 and
> describing the problem and your solution proposal.
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-12198
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io> wrote:
>
> I appreciate it! Let me know if you want me to submit a PR against the
> issue after it is created. It wasn’t a huge amount of code, so it’s
> probably not a big deal if you wanted to redo it.
>
> Thanks,
> Dylan
>
> From: Shengkai Fang <fs...@gmail.com>
> Date: Wednesday, October 7, 2020 at 9:06 AM
> To: Dylan Forciea <dy...@oseberg.io>
> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
> Sorry for late response. +1 to support it. I will open a jira about it
> later.
>
> Dylan Forciea <dy...@oseberg.io>>于2020年10月7日 周三下午
> 9:53写道:
>
>
>
>
>
>
>
>
>
>
>
>
>
> I hadn’t heard a response on this, so I’m going to expand this to the dev
> email list.
>
>
>
> If this is indeed an issue and not my misunderstanding, I have most of a
> patch already coded up. Please let me know, and I can create a JIRA issue
> and send out a PR.
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
> From: Dylan Forciea <dy...@oseberg.io>>
>
>
> Date: Thursday, October 1, 2020 at 5:14 PM
>
>
> To: "user@flink.apache.org<ma...@flink.apache.org>" <
> user@flink.apache.org<ma...@flink.apache.org>>
>
>
> Subject: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
>
>
>
> Hi! I’ve just recently started evaluating Flink for our ETL needs, and I
> ran across an issue with streaming postgres data via the Table/SQL API.
>
>
>
> I see that the API has the scan.fetch-size option, but not
> scan.auto-commit per
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
> . I had attempted to load a large table in, but it completely slurped it
> into memory before starting the streaming. I modified the flink source code
> to add a scan.auto-commit
>
> option, and I was then able to immediately start streaming and cut my
> memory usage way down.
>
>
>
> I see in this thread that there was a similar issue resolved for
> JDBCInputFormat in this thread:
>
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
> , but I don’t see a way to utilize that in the Table/SQL API.
>
>
>
> Am I missing something on how to pull this off?
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
>
>

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

Dylan

From: Dylan Forciea <dy...@oseberg.io>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>
Cc: Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <tr...@apache.org>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <de...@flink.apache.org>
Cc: Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>>" <us...@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg






Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

Dylan

From: Dylan Forciea <dy...@oseberg.io>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>
Cc: Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <tr...@apache.org>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <de...@flink.apache.org>
Cc: Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>>" <us...@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg






Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <tr...@apache.org>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <de...@flink.apache.org>
Cc: Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>>" <us...@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg





Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <tr...@apache.org>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <de...@flink.apache.org>
Cc: Shengkai Fang <fs...@gmail.com>, "user@flink.apache.org" <us...@flink.apache.org>, "jark@apache.org" <ja...@apache.org>, Leonard Xu <xb...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>>" <us...@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg





Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Till Rohrmann <tr...@apache.org>.
Hi Dylan,

thanks for reaching out to the Flink community and excuse our late
response. I am not an expert for the Table API and its JDBC connector but
what you describe sounds like a missing feature. Also given that
FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we
might simply need to make it configurable from the JdbcTableSource. I am
pulling in Jark and Leonard who worked on the JdbcTableSource and might
help you to get this feature into Flink. Their response could take a week
because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and
describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io> wrote:

> I appreciate it! Let me know if you want me to submit a PR against the
> issue after it is created. It wasn’t a huge amount of code, so it’s
> probably not a big deal if you wanted to redo it.
>
> Thanks,
> Dylan
>
> From: Shengkai Fang <fs...@gmail.com>
> Date: Wednesday, October 7, 2020 at 9:06 AM
> To: Dylan Forciea <dy...@oseberg.io>
> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
> Sorry for late response. +1 to support it. I will open a jira about it
> later.
>
> Dylan Forciea <dy...@oseberg.io>>于2020年10月7日
> 周三下午9:53写道:
>
>
>
>
>
>
>
>
>
>
>
>
>
> I hadn’t heard a response on this, so I’m going to expand this to the dev
> email list.
>
>
>
> If this is indeed an issue and not my misunderstanding, I have most of a
> patch already coded up. Please let me know, and I can create a JIRA issue
> and send out a PR.
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
> From: Dylan Forciea <dy...@oseberg.io>>
>
>
> Date: Thursday, October 1, 2020 at 5:14 PM
>
>
> To: "user@flink.apache.org<ma...@flink.apache.org>" <
> user@flink.apache.org<ma...@flink.apache.org>>
>
>
> Subject: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
>
>
>
> Hi! I’ve just recently started evaluating Flink for our ETL needs, and I
> ran across an issue with streaming postgres data via the Table/SQL API.
>
>
>
> I see that the API has the scan.fetch-size option, but not
> scan.auto-commit per
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
> . I had attempted to load a large table in, but it completely slurped it
> into memory before starting the streaming. I modified the flink source code
> to add a scan.auto-commit
>
> option, and I was then able to immediately start streaming and cut my
> memory usage way down.
>
>
>
> I see in this thread that there was a similar issue resolved for
> JDBCInputFormat in this thread:
>
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
> , but I don’t see a way to utilize that in the Table/SQL API.
>
>
>
> Am I missing something on how to pull this off?
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
>
>

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Till Rohrmann <tr...@apache.org>.
Hi Dylan,

thanks for reaching out to the Flink community and excuse our late
response. I am not an expert for the Table API and its JDBC connector but
what you describe sounds like a missing feature. Also given that
FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we
might simply need to make it configurable from the JdbcTableSource. I am
pulling in Jark and Leonard who worked on the JdbcTableSource and might
help you to get this feature into Flink. Their response could take a week
because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and
describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <dy...@oseberg.io> wrote:

> I appreciate it! Let me know if you want me to submit a PR against the
> issue after it is created. It wasn’t a huge amount of code, so it’s
> probably not a big deal if you wanted to redo it.
>
> Thanks,
> Dylan
>
> From: Shengkai Fang <fs...@gmail.com>
> Date: Wednesday, October 7, 2020 at 9:06 AM
> To: Dylan Forciea <dy...@oseberg.io>
> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API
>
> Sorry for late response. +1 to support it. I will open a jira about it
> later.
>
> Dylan Forciea <dy...@oseberg.io>>于2020年10月7日
> 周三下午9:53写道:
>
>
>
>
>
>
>
>
>
>
>
>
>
> I hadn’t heard a response on this, so I’m going to expand this to the dev
> email list.
>
>
>
> If this is indeed an issue and not my misunderstanding, I have most of a
> patch already coded up. Please let me know, and I can create a JIRA issue
> and send out a PR.
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
> From: Dylan Forciea <dy...@oseberg.io>>
>
>
> Date: Thursday, October 1, 2020 at 5:14 PM
>
>
> To: "user@flink.apache.org<ma...@flink.apache.org>" <
> user@flink.apache.org<ma...@flink.apache.org>>
>
>
> Subject: autoCommit for postgres jdbc streaming in Table/SQL API
>
>
>
>
>
>
> Hi! I’ve just recently started evaluating Flink for our ETL needs, and I
> ran across an issue with streaming postgres data via the Table/SQL API.
>
>
>
> I see that the API has the scan.fetch-size option, but not
> scan.auto-commit per
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
> . I had attempted to load a large table in, but it completely slurped it
> into memory before starting the streaming. I modified the flink source code
> to add a scan.auto-commit
>
> option, and I was then able to immediately start streaming and cut my
> memory usage way down.
>
>
>
> I see in this thread that there was a similar issue resolved for
> JDBCInputFormat in this thread:
>
>
>
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
> , but I don’t see a way to utilize that in the Table/SQL API.
>
>
>
> Am I missing something on how to pull this off?
>
>
>
> Regards,
>
> Dylan Forciea
>
> Oseberg
>
>
>
>
>
>

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg






Re: autoCommit for postgres jdbc streaming in Table/SQL API

Posted by Dylan Forciea <dy...@oseberg.io>.
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fs...@gmail.com>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <dy...@oseberg.io>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <dy...@oseberg.io>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg