You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "jchen5 (via GitHub)" <gi...@apache.org> on 2023/07/16 22:49:50 UTC

[GitHub] [spark] jchen5 opened a new pull request, #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

jchen5 opened a new pull request, #42026:
URL: https://github.com/apache/spark/pull/42026

   ### What changes were proposed in this pull request?
   Top-k filters on a dense_rank() window function return wrong results, due to a bug in optimization InferWindowGroupLimit, specifically in the code for DenseRankLimitIterator, introduced in https://issues.apache.org/jira/browse/SPARK-37099.
   
   ### Why are the changes needed?
   Fix wrong results bug.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, fixes wrong results.
   
   ### How was this patch tested?
   Add sql tests and unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1264828686


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1

Review Comment:
   the same.



##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2

Review Comment:
   the same.



##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, dense_rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, dense_rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, row_number() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, row_number() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2

Review Comment:
   the same.



##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, dense_rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, dense_rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2

Review Comment:
   the same.



##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, dense_rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, dense_rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, row_number() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1

Review Comment:
   the same.



##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r <= 2
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	3	2
+NULL	NULL	1
+a	1	2
+a	1	2
+a	NULL	1
+b	1	1
+b	2	2
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, dense_rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1

Review Comment:
   the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1265321287


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala:
##########
@@ -1283,7 +1283,13 @@ class DataFrameWindowFunctionsSuite extends QueryTest
       ("b", 1, "n", Double.PositiveInfinity),
       ("c", 1, "z", -2.0),
       ("c", 1, "a", -4.0),
-      ("c", 2, nullStr, 5.0)).toDF("key", "value", "order", "value2")
+      ("c", 2, nullStr, 5.0),
+      ("d", 0, "1", 1.0),
+      ("d", 1, "1", 2.0),
+      ("d", 2, "2", 3.0),
+      ("d", 3, "2", -1.0),
+      ("d", 4, "2", 2.0),
+      ("d", 4, "3", 2.0)).toDF("key", "value", "order", "value2")

Review Comment:
   The output differed for the dense_rank test cases - for example in the first dense_rank test case below, there was only one row of d output, instead of two.
   
   This is the test output I get for the new test case without the fix in DenseRankLimitIterator:
   ```
   !== Correct Answer - 6 ==   == Spark Answer - 5 ==
   !struct<>                   struct<key:string,value:int,order:string,value2:double,rn:int>
    [a,4,,2.0,1]               [a,4,,2.0,1]
    [a,4,,2.0,1]               [a,4,,2.0,1]
    [b,1,h,NaN,1]              [b,1,h,NaN,1]
    [c,2,null,5.0,1]           [c,2,null,5.0,1]
    [d,0,1,1.0,1]              [d,0,1,1.0,1]
   ![d,1,1,2.0,1]            
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1266147579


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   @zhengruifeng @jchen5 After my double check, the result is OK.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on PR #42026:
URL: https://github.com/apache/spark/pull/42026#issuecomment-1637307565

   cc @beliefer can you help double check whether there is similar issue in other related Iterator implementations?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1264825533


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   It seems the output is incorrect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1266096282


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   You can disable InferWindowGroupLimit, and find out the result.
   If  postgres output as you said, could you investigate the behavior of other databases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1266100624


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   I did check the results with InferWindowGroupLimit disabled, and they match (4 rows).
   
   What makes you think the results are incorrect? There are two rows in each partition that are tied with rank 1 - and the results match that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1265313685


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   This is the correct result. I confirmed with the optimization disabled, and with postgres: https://www.db-fiddle.com/f/e74NGemZYtGkHvDwFoUaQ7/0
   
   There are two rows with (1, 1) and two rows with (2, 1) which have rank 1 in their respective partitions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1265477224


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1

Review Comment:
   Also updated the test to verify the results both with the optimization disabled and enabled:
   ```
   --CONFIG_DIM2 spark.sql.optimizer.windowGroupLimitThreshold=-1
   --CONFIG_DIM2 spark.sql.optimizer.windowGroupLimitThreshold=1000
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1266595962


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   👍 Thanks for double checking



##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   👍 Thanks for double checking



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1265319519


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1
+
+
+-- !query
+SELECT * FROM (SELECT cate, val, rank() OVER(PARTITION BY cate ORDER BY val) as r FROM testData) where r = 1
+-- !query schema
+struct<cate:string,val:int,r:int>
+-- !query output
+NULL	NULL	1
+a	NULL	1
+b	1	1

Review Comment:
   I verified all of these results against the results with the optimization disabled, and they match. What seems incorrect about them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan closed pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit
URL: https://github.com/apache/spark/pull/42026


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1265321287


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala:
##########
@@ -1283,7 +1283,13 @@ class DataFrameWindowFunctionsSuite extends QueryTest
       ("b", 1, "n", Double.PositiveInfinity),
       ("c", 1, "z", -2.0),
       ("c", 1, "a", -4.0),
-      ("c", 2, nullStr, 5.0)).toDF("key", "value", "order", "value2")
+      ("c", 2, nullStr, 5.0),
+      ("d", 0, "1", 1.0),
+      ("d", 1, "1", 2.0),
+      ("d", 2, "2", 3.0),
+      ("d", 3, "2", -1.0),
+      ("d", 4, "2", 2.0),
+      ("d", 4, "3", 2.0)).toDF("key", "value", "order", "value2")

Review Comment:
   The output differed for the dense_rank test cases - for example in the first dense_rank test case below, there was only one row of d output, instead of two.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1266101093


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   @beliefer I just test this against `Spark 3.3.2`:
   
   ```
   spark-sql> create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2);
   Time taken: 0.955 seconds
   spark-sql> select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1;
   1	1	1
   1	1	1
   2	1	1
   2	1	1
   Time taken: 1.101 seconds, Fetched 4 row(s)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #42026:
URL: https://github.com/apache/spark/pull/42026#issuecomment-1641223824

   thanks, merging to master/3.5!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1266105771


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   @beliefer `Spark 3.4.1` returns the same results as `Spark 3.3.2`.
   
   But the master branch returns:
   ```
   spark-sql (default)> create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2);
   Time taken: 1.148 seconds
   spark-sql (default)> select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1;
   1       1       1
   1       1       1
   2       1       1
   Time taken: 1.346 seconds, Fetched 3 row(s)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1264815173


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala:
##########
@@ -1304,7 +1310,8 @@ class DataFrameWindowFunctionsSuite extends QueryTest
                 Seq(
                   Row("a", 4, "", 2.0, 1),
                   Row("b", 1, "h", Double.NaN, 1),
-                  Row("c", 2, null, 5.0, 1)
+                  Row("c", 2, null, 5.0, 1),
+                  Row("d", 0, "1", 1.0, 1)

Review Comment:
   The result is not correct here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1266105009


##########
sql/core/src/test/resources/sql-tests/results/window.sql.out:
##########
@@ -1342,3 +1342,93 @@ org.apache.spark.sql.AnalysisException
     "windowName" : "w"
   }
 }
+
+
+-- !query
+create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1
+-- !query schema
+struct<p:int,o:int,rnk:int>
+-- !query output
+1	1	1
+1	1	1
+2	1	1
+2	1	1

Review Comment:
   Last day, I make InferWindowGroupLimit disabled and the output is 3 rows. Maybe I make a mistake. I will check it again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1264815173


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala:
##########
@@ -1304,7 +1310,8 @@ class DataFrameWindowFunctionsSuite extends QueryTest
                 Seq(
                   Row("a", 4, "", 2.0, 1),
                   Row("b", 1, "h", Double.NaN, 1),
-                  Row("c", 2, null, 5.0, 1)
+                  Row("c", 2, null, 5.0, 1),
+                  Row("d", 0, "1", 1.0, 1)

Review Comment:
   The result is not correct here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on PR #42026:
URL: https://github.com/apache/spark/pull/42026#issuecomment-1637309370

   Yes, that would be good to double check. In this PR I fixed the iterators for rank and dense_rank. The row_number iterator looks fine (since it just has to increment a counter, doesn't need to do row comparisons.) Those should be the only 3 iterators supported in the InferWindowGroupLimit optimization, but do you know if there may be other iterators elsewhere with a similar problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on PR #42026:
URL: https://github.com/apache/spark/pull/42026#issuecomment-1638292520

   I created a PR with the tests before the fix, so you can see the differences: https://github.com/apache/spark/pull/42042/files#diff-68100e5e6079666747882f5070767601087afc8181f513d997f6e39a948eaa0d


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1266153491


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala:
##########
@@ -1283,7 +1283,13 @@ class DataFrameWindowFunctionsSuite extends QueryTest
       ("b", 1, "n", Double.PositiveInfinity),
       ("c", 1, "z", -2.0),
       ("c", 1, "a", -4.0),
-      ("c", 2, nullStr, 5.0)).toDF("key", "value", "order", "value2")
+      ("c", 2, nullStr, 5.0),
+      ("d", 0, "1", 1.0),
+      ("d", 1, "1", 2.0),
+      ("d", 2, "2", 3.0),
+      ("d", 3, "2", -1.0),
+      ("d", 4, "2", 2.0),
+      ("d", 4, "3", 2.0)).toDF("key", "value", "order", "value2")

Review Comment:
   I got it now. @jchen5 Thank you for the fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #42026:
URL: https://github.com/apache/spark/pull/42026#discussion_r1264823780


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala:
##########
@@ -1283,7 +1283,13 @@ class DataFrameWindowFunctionsSuite extends QueryTest
       ("b", 1, "n", Double.PositiveInfinity),
       ("c", 1, "z", -2.0),
       ("c", 1, "a", -4.0),
-      ("c", 2, nullStr, 5.0)).toDF("key", "value", "order", "value2")
+      ("c", 2, nullStr, 5.0),
+      ("d", 0, "1", 1.0),
+      ("d", 1, "1", 2.0),
+      ("d", 2, "2", 3.0),
+      ("d", 3, "2", -1.0),
+      ("d", 4, "2", 2.0),
+      ("d", 4, "3", 2.0)).toDF("key", "value", "order", "value2")

Review Comment:
   I checked these test cases. It seems the output is the same after this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] jchen5 commented on pull request #42026: [SPARK-44448][SQL] Fix wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.

jchen5 commented on PR #42026:
URL: https://github.com/apache/spark/pull/42026#issuecomment-1637261497

   CC @beliefer @zhengruifeng @cloud-fan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org