You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/02/11 02:32:32 UTC

[GitHub] [hive] dengzhhu653 opened a new pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

dengzhhu653 opened a new pull request #2585:
URL: https://github.com/apache/hive/pull/2585


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 edited a comment on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 edited a comment on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-1048902780


   I found something interesting, when I explain `select col1, count(distinct col2) from partition_distinct_skew group by col1;` on master branch,  the output is following:
   ```
         Vertices:
           Map 1
               Map Operator Tree:
                   TableScan
                     alias: partition_distinct_skew
                     Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE
                     Select Operator
                       expressions: col1 (type: string), col2 (type: string)
                       outputColumnNames: col1, col2
                       Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE
                       Group By Operator
                         keys: col1 (type: string), col2 (type: string)
                         minReductionHashAggr: 0.4
                         mode: hash
                         outputColumnNames: _col0, _col1
                         Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE
                         Reduce Output Operator
                           key expressions: _col0 (type: string), _col1 (type: string)
                           null sort order: zz
                           sort order: ++
                           Map-reduce partition columns: rand() (type: double)
                           Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE
   ```
   The partition column is **rand()** for this case. It's seems we have done something to improve the skew case, though I'm not able to find where the cause locates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #2585:
URL: https://github.com/apache/hive/pull/2585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #2585:
URL: https://github.com/apache/hive/pull/2585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-975438145


   @dengzhhu653 do you happen to have a testcase for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-1048902780


   
   I found something interesting, when I explain `select col1, count(distinct col2) from partition_distinct_skew group by col1;` on master branch,  the output is following:
   ```
         Vertices:
           Map 1
               Map Operator Tree:
                   TableScan
                     alias: partition_distinct_skew
                     Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE
                     Select Operator
                       expressions: col1 (type: string), col2 (type: string)
                       outputColumnNames: col1, col2
                       Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE
                       Group By Operator
                         keys: col1 (type: string), col2 (type: string)
                         minReductionHashAggr: 0.4
                         mode: hash
                         outputColumnNames: _col0, _col1
                         Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE
                         Reduce Output Operator
                           key expressions: _col0 (type: string), _col1 (type: string)
                           null sort order: zz
                           sort order: ++
                           Map-reduce partition columns: rand() (type: double)
                           Statistics: Num rows: 2 Data size: 340 Basic stats: COMPLETE Column stats: COMPLETE
   ```
   The partition column is **rand()** for this case. it's seems we have done something to improve the skew case, though I not able to find where the cause locates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-975448819


   > @dengzhhu653 do you happen to have a testcase for this?
   
   Not yet, I have tested on our environment for the skew table, shows that it can get pretty performance gain(mr).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-1027414846


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-938688401


   Hey @pgaref, mind taking a look if have secs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-985433461


   > > @dengzhhu653 do you happen to have a testcase for this?
   > 
   > Not yet, I have tested on our environment for the skew table, shows that it can get pretty performance gain(mr).
   
   Hi @kgyrtkirk, what do you think about this? there are also some tests like [groupby11.q](https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/test/queries/clientpositive/groupby11.q) and [groupby8_map_skew.q](https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/test/queries/clientpositive/groupby8_map_skew.q) showing the changes in partition columns after applying the fix. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #2585:
URL: https://github.com/apache/hive/pull/2585#discussion_r813487836



##########
File path: ql/src/test/results/clientpositive/llap/partition_distinct_skew.q.out
##########
@@ -0,0 +1,261 @@
+PREHOOK: query: create table partition_distinct_skew(col1 string, col2 string)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@partition_distinct_skew
+POSTHOOK: query: create table partition_distinct_skew(col1 string, col2 string)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@partition_distinct_skew
+PREHOOK: query: insert into table partition_distinct_skew values('a', 'b'), ('a', 'a'), ('a', 'b')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@partition_distinct_skew
+POSTHOOK: query: insert into table partition_distinct_skew values('a', 'b'), ('a', 'a'), ('a', 'b')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@partition_distinct_skew
+POSTHOOK: Lineage: partition_distinct_skew.col1 SCRIPT []
+POSTHOOK: Lineage: partition_distinct_skew.col2 SCRIPT []
+PREHOOK: query: select col1, col2 from partition_distinct_skew
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+POSTHOOK: query: select col1, col2 from partition_distinct_skew
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+a	b
+a	a
+a	b
+PREHOOK: query: explain select col1, count(distinct col2), count(col2) from partition_distinct_skew group by col1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+POSTHOOK: query: explain select col1, count(distinct col2), count(col2) from partition_distinct_skew group by col1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (SIMPLE_EDGE)
+        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1 
+            Map Operator Tree:
+                TableScan
+                  alias: partition_distinct_skew
+                  Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE
+                  Select Operator
+                    expressions: col1 (type: string), col2 (type: string)
+                    outputColumnNames: col1, col2
+                    Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE Column stats: COMPLETE
+                    Group By Operator
+                      aggregations: count(DISTINCT col2), count(col2)
+                      keys: col1 (type: string), col2 (type: string)
+                      minReductionHashAggr: 0.4
+                      mode: hash
+                      outputColumnNames: _col0, _col1, _col2, _col3
+                      Statistics: Num rows: 2 Data size: 372 Basic stats: COMPLETE Column stats: COMPLETE
+                      Reduce Output Operator
+                        key expressions: _col0 (type: string), _col1 (type: string)
+                        null sort order: zz
+                        sort order: ++
+                        Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
+                        Statistics: Num rows: 2 Data size: 372 Basic stats: COMPLETE Column stats: COMPLETE
+                        value expressions: _col3 (type: bigint)
+            Execution mode: vectorized, llap
+            LLAP IO: all inputs
+        Reducer 2 
+            Execution mode: llap
+            Reduce Operator Tree:
+              Group By Operator
+                aggregations: count(DISTINCT KEY._col1:0._col0), count(VALUE._col1)
+                keys: KEY._col0 (type: string)
+                mode: partials
+                outputColumnNames: _col0, _col1, _col2
+                Statistics: Num rows: 2 Data size: 202 Basic stats: COMPLETE Column stats: COMPLETE
+                Reduce Output Operator
+                  key expressions: _col0 (type: string)
+                  null sort order: z
+                  sort order: +
+                  Map-reduce partition columns: _col0 (type: string)
+                  Statistics: Num rows: 2 Data size: 202 Basic stats: COMPLETE Column stats: COMPLETE
+                  value expressions: _col1 (type: bigint), _col2 (type: bigint)
+        Reducer 3 
+            Execution mode: llap
+            Reduce Operator Tree:
+              Group By Operator
+                aggregations: count(VALUE._col0), count(VALUE._col1)
+                keys: KEY._col0 (type: string)
+                mode: final
+                outputColumnNames: _col0, _col1, _col2
+                Statistics: Num rows: 1 Data size: 101 Basic stats: COMPLETE Column stats: COMPLETE
+                File Output Operator
+                  compressed: false
+                  Statistics: Num rows: 1 Data size: 101 Basic stats: COMPLETE Column stats: COMPLETE
+                  table:
+                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
+                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+  Stage: Stage-0
+    Fetch Operator
+      limit: -1
+      Processor Tree:
+        ListSink
+
+PREHOOK: query: select col1, count(distinct col2), count(col2)  from partition_distinct_skew group by col1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+POSTHOOK: query: select col1, count(distinct col2), count(col2)  from partition_distinct_skew group by col1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+a	2	3
+PREHOOK: query: explain select col1, count(distinct col2) from partition_distinct_skew group by col1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+POSTHOOK: query: explain select col1, count(distinct col2) from partition_distinct_skew group by col1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (SIMPLE_EDGE)
+        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+        Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
+        Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
+#### A masked pattern was here ####

Review comment:
       The plan of `select col1, count(distinct col2) from partition_distinct_skew group by col1` introduces some redundant reducers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-918013599


   @kasakrisz cloud you please take a look at the changes ? 
   Thanks,
   Zhihua Deng


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-899965525


   Hi @kgyrtkirk @zabetak, cloud you please take a look if have secs? 
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 removed a comment on pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
dengzhhu653 removed a comment on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-918013599


   @kasakrisz cloud you please take a look at the changes ? 
   Thanks,
   Zhihua Deng


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2585: HIVE-25448: Invalid partition columns when skew with distinct

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2585:
URL: https://github.com/apache/hive/pull/2585#discussion_r812739484



##########
File path: ql/src/test/results/clientpositive/llap/autoColumnStats_7.q.out
##########
@@ -56,7 +56,7 @@ STAGE PLANS:
                       key expressions: _col0 (type: string), _col1 (type: string)
                       null sort order: zz
                       sort order: ++
-                      Map-reduce partition columns: _col0 (type: string)
+                      Map-reduce partition columns: _col0 (type: string), _col1 (type: string)

Review comment:
       do you happen to have a directed testcase which were working incorrectly before this patch?
   
   I guess it was returning 3 for distinct in case the rows were in the order of:
   ```
   a | b
   a | a
   a | b
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org