You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "felixzh2020 (via GitHub)" <gi...@apache.org> on 2023/04/13 02:45:14 UTC
[GitHub] [kyuubi] felixzh2020 opened a new pull request, #4701: [KYUUBI apache#4083] FlinkSQL returns duplicate results when executin…
felixzh2020 opened a new pull request, #4701:
URL: https://github.com/apache/kyuubi/pull/4701
…g query statement
<!--
Thanks for sending a pull request!
Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/CONTRIBUTING.html
2. If the PR is related to an issue in https://github.com/apache/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->
### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
1. If you add a feature, you can talk about the use case of it.
2. If you fix a bug, you can clarify why it is a bug.
-->
[[BUG] [kyuubi-flink-sql-engine] FlinkSQL returns duplicate results when executing query statement](https://github.com/apache/kyuubi/issues/4083#top)
#4083
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
1. set kyuubi.session.engine.flink.max.rows to appropriate value (eg: 500).
2. whatever earliest kafkaTable or latest kafkaTable, when select * from yourtable, you can get line number equal to kyuubi.session.engine.flink.max.rows.
3. https://kyuubi.readthedocs.io/en/v1.7.0/deployment/settings.html#session
Suggest kyuubi.session.engine.flink.max.rows default value change to 500.
And explanation: too large may trigger OOM
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166259381
##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
}
}
+ test("ensure data is exactly-once added to the resultSet") {
+ withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+ withJdbcStatement() { statement =>
+ statement.execute(
+ """
+ |create table tbl_src (
+ | a bigint
+ | ) with (
+ | 'connector' = 'datagen',
+ | 'rows-per-second'='1',
+ | 'fields.a.kind'='sequence',
+ | 'fields.a.start'='1',
+ | 'fields.a.end'='5'
+ | )
Review Comment:
Please try your best to make the code in a good format
```suggestion
"""
|CREATE TABLE tbl_src (
| a BIGINT
|) WITH (
| 'connector' = 'datagen',
| 'rows-per-second' = '1',
| 'fields.a.kind' = 'sequence',
| 'fields.a.start' = '1',
| 'fields.a.end' = '5'
|)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] pan3793 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1506276410
@felixzh2020 thanks for your contribution, do you have a chance to add unit tests to verify your fix?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] felixzh2020 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1167334167
##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
}
}
+ test("ensure data is exactly-once added to the resultSet") {
+ withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+ withJdbcStatement() { statement =>
+ statement.execute(
+ """
+ |create table tbl_src (
+ | a bigint
+ | ) with (
+ | 'connector' = 'datagen',
+ | 'rows-per-second'='1',
+ | 'fields.a.kind'='sequence',
+ | 'fields.a.start'='1',
+ | 'fields.a.end'='5'
+ | )
Review Comment:
resubmitted
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] felixzh2020 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1510605540
@pan3793 Run ./build/dist --tgz --spark-provided --flink-provided --hive-provided
Now scalastyle check is ok.
![image](https://user-images.githubusercontent.com/16359007/232364348-08a3735b-4690-4500-ade1-dc7a719d735d.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] felixzh2020 closed pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 closed pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
URL: https://github.com/apache/kyuubi/pull/4701
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166262891
##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
while (loop) {
Thread.sleep(50) // slow the processing down
- val pageSize = Math.min(500, resultMaxRows)
- val result = executor.snapshotResult(sessionId, resultId, pageSize)
+ val result = executor.snapshotResult(sessionId, resultId, resultMaxRows)
Review Comment:
If I understand correctly, this change fixes a correctness issue, but breaks the original design - retrieve the result in small pages to avoid OOM, am I right?
also cc @link3280 and @bowenliang123
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1167791243
##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -35,6 +36,7 @@ import org.apache.kyuubi.jdbc.hive.KyuubiStatement
import org.apache.kyuubi.operation.HiveJDBCTestHelper
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
+
Review Comment:
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166259381
##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
}
}
+ test("ensure data is exactly-once added to the resultSet") {
+ withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+ withJdbcStatement() { statement =>
+ statement.execute(
+ """
+ |create table tbl_src (
+ | a bigint
+ | ) with (
+ | 'connector' = 'datagen',
+ | 'rows-per-second'='1',
+ | 'fields.a.kind'='sequence',
+ | 'fields.a.start'='1',
+ | 'fields.a.end'='5'
+ | )
Review Comment:
```suggestion
"""
|CREATE TABLE tbl_src (
| a BIGINT
|) WITH (
| 'connector' = 'datagen',
| 'rows-per-second' = '1',
| 'fields.a.kind' = 'sequence',
| 'fields.a.start' = '1',
| 'fields.a.end' = '5'
|)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166259730
##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
}
}
+ test("ensure data is exactly-once added to the resultSet") {
+ withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+ withJdbcStatement() { statement =>
+ statement.execute(
+ """
+ |create table tbl_src (
+ | a bigint
+ | ) with (
+ | 'connector' = 'datagen',
+ | 'rows-per-second'='1',
+ | 'fields.a.kind'='sequence',
+ | 'fields.a.start'='1',
+ | 'fields.a.end'='5'
+ | )
+ |""".stripMargin)
+ val resultSet = statement.executeQuery(s"select a from tbl_src")
+ var rows = List[Long]()
Review Comment:
use mutable List instead
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] felixzh2020 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166360491
##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
while (loop) {
Thread.sleep(50) // slow the processing down
- val pageSize = Math.min(500, resultMaxRows)
- val result = executor.snapshotResult(sessionId, resultId, pageSize)
+ val result = executor.snapshotResult(sessionId, resultId, resultMaxRows)
Review Comment:
In my opinion, if in order to retrieve the result in small pages to avoid OOM, suggest kyuubi start to check kyuubi.session.engine.flink.max.rows value and provide detailed explanations on the official website.
Now val pageSize = Math.min(500, resultMaxRows), dead value(500) is imperfect.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] felixzh2020 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1518903419
As far as I am concerned, master branch have change implementation plan, maybe this pr is not need. Closed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] codecov-commenter commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1506297303
## [Codecov](https://codecov.io/gh/apache/kyuubi/pull/4701?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#4701](https://codecov.io/gh/apache/kyuubi/pull/4701?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bd5a999) into [master](https://codecov.io/gh/apache/kyuubi/commit/f0615a9aab0a5b755c16ee0b966a5ea59b98bd10?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f0615a9) will **decrease** coverage by `0.06%`.
> The diff coverage is `n/a`.
```diff
@@ Coverage Diff @@
## master #4701 +/- ##
============================================
- Coverage 57.99% 57.94% -0.06%
Complexity 13 13
============================================
Files 580 580
Lines 32218 32220 +2
Branches 4304 4304
============================================
- Hits 18684 18669 -15
- Misses 11749 11755 +6
- Partials 1785 1796 +11
```
[see 13 files with indirect coverage changes](https://codecov.io/gh/apache/kyuubi/pull/4701/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] felixzh2020 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1507782816
@pan3793 My UT is pass. Maybe rerun check?
![image](https://user-images.githubusercontent.com/16359007/231914651-4b097a3a-2ead-4ead-a6ba-0fcedc6cee09.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] felixzh2020 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166360491
##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
while (loop) {
Thread.sleep(50) // slow the processing down
- val pageSize = Math.min(500, resultMaxRows)
- val result = executor.snapshotResult(sessionId, resultId, pageSize)
+ val result = executor.snapshotResult(sessionId, resultId, resultMaxRows)
Review Comment:
In my opinion, if in order to retrieve the result in small pages to avoid OOM, suggest kyuubi start to check kyuubi.session.engine.flink.max.rows value and provide detailed explanations on the official website.
Now val pageSize = Math.min(500, resultMaxRows), dead value(500) is imperfect.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] felixzh2020 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1506761315
@pan3793 I have added UT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166259604
##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
}
}
+ test("ensure data is exactly-once added to the resultSet") {
+ withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+ withJdbcStatement() { statement =>
+ statement.execute(
+ """
+ |create table tbl_src (
+ | a bigint
+ | ) with (
+ | 'connector' = 'datagen',
+ | 'rows-per-second'='1',
+ | 'fields.a.kind'='sequence',
+ | 'fields.a.start'='1',
+ | 'fields.a.end'='5'
+ | )
+ |""".stripMargin)
+ val resultSet = statement.executeQuery(s"select a from tbl_src")
Review Comment:
the leading `s` is not required here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] link3280 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
Posted by "link3280 (via GitHub)" <gi...@apache.org>.
link3280 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166726776
##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
while (loop) {
Thread.sleep(50) // slow the processing down
- val pageSize = Math.min(500, resultMaxRows)
- val result = executor.snapshotResult(sessionId, resultId, pageSize)
+ val result = executor.snapshotResult(sessionId, resultId, resultMaxRows)
Review Comment:
> If I understand correctly, this change fixes a correctness issue, but breaks the original design - retrieve the result in small pages to avoid OOM, am I right?
>
> also cc @link3280 and @bowenliang123
Not really, the row limit is introduced to avoid OOM. The page size is designed to keep fetching rows until the limit is reached or all records are read, because we may not get enough rows with one fetch (consider Kafka as the source).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org