You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "felixzh2020 (via GitHub)" <gi...@apache.org> on 2023/04/13 02:45:14 UTC

[GitHub] [kyuubi] felixzh2020 opened a new pull request, #4701: [KYUUBI apache#4083] FlinkSQL returns duplicate results when executin…

felixzh2020 opened a new pull request, #4701:
URL: https://github.com/apache/kyuubi/pull/4701

   …g query statement
   
   <!--
   Thanks for sending a pull request!
   
   Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/CONTRIBUTING.html
     2. If the PR is related to an issue in https://github.com/apache/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
   -->
   
   ### _Why are the changes needed?_
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you add a feature, you can talk about the use case of it.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   [[BUG] [kyuubi-flink-sql-engine] FlinkSQL returns duplicate results when executing query statement](https://github.com/apache/kyuubi/issues/4083#top)
   #4083
   
   
   ### _How was this patch tested?_
   - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
   1. set kyuubi.session.engine.flink.max.rows to appropriate value (eg: 500).
   2. whatever earliest kafkaTable or latest kafkaTable, when select * from yourtable,  you can get line number equal to kyuubi.session.engine.flink.max.rows.
   3. https://kyuubi.readthedocs.io/en/v1.7.0/deployment/settings.html#session
      Suggest kyuubi.session.engine.flink.max.rows default value change to 500.
      And explanation: too large may trigger OOM
   - [ ] Add screenshots for manual tests if appropriate
   
   - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166259381


##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
     }
   }
 
+  test("ensure data is exactly-once added to the resultSet") {
+    withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+      withJdbcStatement() { statement =>
+        statement.execute(
+          """
+            |create table tbl_src (
+            |           a bigint
+            |           ) with (
+            |           'connector' = 'datagen',
+            |          'rows-per-second'='1',
+            |          'fields.a.kind'='sequence',
+            |          'fields.a.start'='1',
+            |          'fields.a.end'='5'
+            |          )

Review Comment:
   Please try your best to make the code in a good format
   
   ```suggestion
             """
               |CREATE TABLE tbl_src (
               |  a BIGINT
               |) WITH (
               |  'connector' = 'datagen',
               |  'rows-per-second' = '1',
               |  'fields.a.kind' = 'sequence',
               |  'fields.a.start' = '1',
               |  'fields.a.end' = '5'
               |)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1506276410

   @felixzh2020 thanks for your contribution, do you have a chance to add unit tests to verify your fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] felixzh2020 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1167334167


##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
     }
   }
 
+  test("ensure data is exactly-once added to the resultSet") {
+    withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+      withJdbcStatement() { statement =>
+        statement.execute(
+          """
+            |create table tbl_src (
+            |           a bigint
+            |           ) with (
+            |           'connector' = 'datagen',
+            |          'rows-per-second'='1',
+            |          'fields.a.kind'='sequence',
+            |          'fields.a.start'='1',
+            |          'fields.a.end'='5'
+            |          )

Review Comment:
   resubmitted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] felixzh2020 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1510605540

   @pan3793 Run  ./build/dist --tgz --spark-provided --flink-provided --hive-provided
   Now scalastyle check is ok.
   ![image](https://user-images.githubusercontent.com/16359007/232364348-08a3735b-4690-4500-ade1-dc7a719d735d.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] felixzh2020 closed pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 closed pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement
URL: https://github.com/apache/kyuubi/pull/4701


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166262891


##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
       while (loop) {
         Thread.sleep(50) // slow the processing down
 
-        val pageSize = Math.min(500, resultMaxRows)
-        val result = executor.snapshotResult(sessionId, resultId, pageSize)
+        val result = executor.snapshotResult(sessionId, resultId, resultMaxRows)

Review Comment:
   If I understand correctly, this change fixes a correctness issue, but breaks the original design - retrieve the result in small pages to avoid OOM, am I right?
   
   also cc @link3280 and @bowenliang123 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1167791243


##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -35,6 +36,7 @@ import org.apache.kyuubi.jdbc.hive.KyuubiStatement
 import org.apache.kyuubi.operation.HiveJDBCTestHelper
 import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
 
+

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166259381


##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
     }
   }
 
+  test("ensure data is exactly-once added to the resultSet") {
+    withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+      withJdbcStatement() { statement =>
+        statement.execute(
+          """
+            |create table tbl_src (
+            |           a bigint
+            |           ) with (
+            |           'connector' = 'datagen',
+            |          'rows-per-second'='1',
+            |          'fields.a.kind'='sequence',
+            |          'fields.a.start'='1',
+            |          'fields.a.end'='5'
+            |          )

Review Comment:
   ```suggestion
             """
               |CREATE TABLE tbl_src (
               |  a BIGINT
               |) WITH (
               |  'connector' = 'datagen',
               |  'rows-per-second' = '1',
               |  'fields.a.kind' = 'sequence',
               |  'fields.a.start' = '1',
               |  'fields.a.end' = '5'
               |)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166259730


##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
     }
   }
 
+  test("ensure data is exactly-once added to the resultSet") {
+    withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+      withJdbcStatement() { statement =>
+        statement.execute(
+          """
+            |create table tbl_src (
+            |           a bigint
+            |           ) with (
+            |           'connector' = 'datagen',
+            |          'rows-per-second'='1',
+            |          'fields.a.kind'='sequence',
+            |          'fields.a.start'='1',
+            |          'fields.a.end'='5'
+            |          )
+            |""".stripMargin)
+        val resultSet = statement.executeQuery(s"select a from tbl_src")
+        var rows = List[Long]()

Review Comment:
   use mutable List instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] felixzh2020 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166360491


##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
       while (loop) {
         Thread.sleep(50) // slow the processing down
 
-        val pageSize = Math.min(500, resultMaxRows)
-        val result = executor.snapshotResult(sessionId, resultId, pageSize)
+        val result = executor.snapshotResult(sessionId, resultId, resultMaxRows)

Review Comment:
   In my opinion, if in order to retrieve the result in small pages to avoid OOM,  suggest kyuubi start to check kyuubi.session.engine.flink.max.rows value and provide detailed explanations on the official website. 
   Now  val pageSize = Math.min(500, resultMaxRows),  dead value(500) is imperfect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] felixzh2020 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1518903419

   As far as I am concerned, master branch have change implementation plan, maybe this pr is not need. Closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] codecov-commenter commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1506297303

   ## [Codecov](https://codecov.io/gh/apache/kyuubi/pull/4701?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#4701](https://codecov.io/gh/apache/kyuubi/pull/4701?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bd5a999) into [master](https://codecov.io/gh/apache/kyuubi/commit/f0615a9aab0a5b755c16ee0b966a5ea59b98bd10?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f0615a9) will **decrease** coverage by `0.06%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #4701      +/-   ##
   ============================================
   - Coverage     57.99%   57.94%   -0.06%     
     Complexity       13       13              
   ============================================
     Files           580      580              
     Lines         32218    32220       +2     
     Branches       4304     4304              
   ============================================
   - Hits          18684    18669      -15     
   - Misses        11749    11755       +6     
   - Partials       1785     1796      +11     
   ```
   
   
   [see 13 files with indirect coverage changes](https://codecov.io/gh/apache/kyuubi/pull/4701/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] felixzh2020 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1507782816

   @pan3793 My UT is pass. Maybe rerun check?
   ![image](https://user-images.githubusercontent.com/16359007/231914651-4b097a3a-2ead-4ead-a6ba-0fcedc6cee09.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] felixzh2020 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166360491


##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
       while (loop) {
         Thread.sleep(50) // slow the processing down
 
-        val pageSize = Math.min(500, resultMaxRows)
-        val result = executor.snapshotResult(sessionId, resultId, pageSize)
+        val result = executor.snapshotResult(sessionId, resultId, resultMaxRows)

Review Comment:
   In my opinion, if in order to retrieve the result in small pages to avoid OOM,  suggest kyuubi start to check kyuubi.session.engine.flink.max.rows value and provide detailed explanations on the official website. 
   Now  val pageSize = Math.min(500, resultMaxRows),  dead value(500) is imperfect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] felixzh2020 commented on pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "felixzh2020 (via GitHub)" <gi...@apache.org>.
felixzh2020 commented on PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#issuecomment-1506761315

   @pan3793 I have added UT.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166259604


##########
externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala:
##########
@@ -1066,6 +1066,32 @@ abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTest
     }
   }
 
+  test("ensure data is exactly-once added to the resultSet") {
+    withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "20"))(Map.empty) {
+      withJdbcStatement() { statement =>
+        statement.execute(
+          """
+            |create table tbl_src (
+            |           a bigint
+            |           ) with (
+            |           'connector' = 'datagen',
+            |          'rows-per-second'='1',
+            |          'fields.a.kind'='sequence',
+            |          'fields.a.start'='1',
+            |          'fields.a.end'='5'
+            |          )
+            |""".stripMargin)
+        val resultSet = statement.executeQuery(s"select a from tbl_src")

Review Comment:
   the leading `s` is not required here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] link3280 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Posted by "link3280 (via GitHub)" <gi...@apache.org>.
link3280 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166726776


##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
       while (loop) {
         Thread.sleep(50) // slow the processing down
 
-        val pageSize = Math.min(500, resultMaxRows)
-        val result = executor.snapshotResult(sessionId, resultId, pageSize)
+        val result = executor.snapshotResult(sessionId, resultId, resultMaxRows)

Review Comment:
   > If I understand correctly, this change fixes a correctness issue, but breaks the original design - retrieve the result in small pages to avoid OOM, am I right?
   > 
   > also cc @link3280 and @bowenliang123
   
   Not really, the row limit is introduced to avoid OOM. The page size is designed to keep fetching rows until the limit is reached or all records are read, because we may not get enough rows with one fetch (consider Kafka as the source).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org