You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/01/04 13:32:32 UTC

[GitHub] [hive] mustafaiman opened a new pull request #1824: HIVE-24510: Vectorize compute_bit_vector

mustafaiman opened a new pull request #1824:
URL: https://github.com/apache/hive/pull/1824


   I organized this pull request in 3 commits for ease of review.
   First commit adds a test that prints hex(compute_bit_vector(col)) for some tables. So we can use that as the source of truth for testing vectorized function.
   Second commit is the actual implementation.
   Third commit is the q.out file changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog edited a comment on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog edited a comment on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756758376


   > I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime.
   > 
   > > you can still do something like:
   > > if compute_bit_vector: -> handle constant parameter
   > 
   > We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either.
   > Do you see any other benefit than preserving the earlier q.out outputs?
   
   no, I'm concerned only about the qout changes
   you're right, if compute_bit_vector is a relatively new thing then we can also ignore backward compatibility problems and go on with compute_bit_vector_hll
   I would personally keep pursuing a smaller patch as having "compute_bit_vector_hll" has no benefits either, but it's up to you, I think if the default hll algo won't be changed in the near future for stats, we can go with the updated qouts :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog edited a comment on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog edited a comment on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756588127


   > @abstractdog I had to introduce `compute_bit_vector_hll` and `compute_bit_vector_fm` because, vectorization is supported only for single argument functions. compute_bit_vector(arg1, arg2) will not vectorize.
   
   I think you can do it, refer to VectorUDAFBloomFilterMerge
   
   it uses "matches" function without the param, which is ok, param value is handled later:
   https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java#L578
   
   but then it's instantiated with parameters from conf:
   https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1242
   
   could this work for you?
   
   UPDATE: okay, I found where is prohibits parameters:
   https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531
   I think this should be modified, I cannot see any problem with attaching constant parameters to aggregate expressions, and you can still do something like:
   if compute_bit_vector: -> handle constant parameter
   else: "Aggregations with > 1 parameter are only supported to 'x' and 'y' expression"
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman commented on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
mustafaiman commented on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756725412


   I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime.
   
   > you can still do something like:
   > if compute_bit_vector: -> handle constant parameter
   We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either.
   Do you see any other benefit than preserving the earlier q.out outputs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman closed pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
mustafaiman closed pull request #1824:
URL: https://github.com/apache/hive/pull/1824


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #1824:
URL: https://github.com/apache/hive/pull/1824#discussion_r553769492



##########
File path: ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out
##########
@@ -161,20 +168,28 @@ STAGE PLANS:
                             Map-reduce partition columns: _col0 (type: string), _col1 (type: tinyint)
                             Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: NONE
                             value expressions: _col2 (type: smallint), _col3 (type: smallint), _col4 (type: bigint), _col5 (type: bigint), _col6 (type: binary), _col7 (type: int), _col8 (type: int), _col9 (type: bigint), _col10 (type: binary), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type: binary), _col15 (type: float), _col16 (type: float), _col17 (type: bigint), _col18 (type: binary)
-                      Reduce Output Operator
-                        key expressions: _col4 (type: tinyint)
-                        null sort order: a
-                        sort order: +
-                        Map-reduce partition columns: _col4 (type: tinyint)
-                        Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: NONE
-                        value expressions: _col0 (type: smallint), _col1 (type: int), _col2 (type: bigint), _col3 (type: float)
             Execution mode: llap
             LLAP IO: all inputs
         Reducer 2 
+            Execution mode: llap

Review comment:
       okay, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756758376


   > I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime.
   > 
   > > you can still do something like:
   > > if compute_bit_vector: -> handle constant parameter
   > 
   > We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either.
   > Do you see any other benefit than preserving the earlier q.out outputs?
   
   no, I'm concerned only about the qout changes
   you're right, if compute_bit_vector is a relatively new thing then we can also ignore backward compatibility problems and go on with compute_bit_vector_hll
   I would personally keep pursuing a smaller patch as having "compute_bit_vector_hll" has no benefits either, but it's up to you, I think if the default hll algo won't be changed in the near future, we can go with the updated qouts :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756588127


   > @abstractdog I had to introduce `compute_bit_vector_hll` and `compute_bit_vector_fm` because, vectorization is supported only for single argument functions. compute_bit_vector(arg1, arg2) will not vectorize.
   
   I think you can do it, refer to VectorUDAFBloomFilterMerge
   
   it uses "matches" function without the param, which is ok, param value is handled later:
   https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java#L578
   
   but then it's instantiated with parameters from conf:
   https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java#L1242
   
   could this work for you?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #1824:
URL: https://github.com/apache/hive/pull/1824#discussion_r553142400



##########
File path: ql/src/test/results/clientpositive/llap/auto_sortmerge_join_14.q.out
##########
@@ -194,7 +222,7 @@ STAGE PLANS:
                       keys:
                         0 _col0 (type: int)
                         1 _col0 (type: int)
-                      Statistics: Num rows: 221 Data size: 1768 Basic stats: COMPLETE Column stats: COMPLETE
+                      Statistics: Num rows: 220 Data size: 1760 Basic stats: COMPLETE Column stats: COMPLETE

Review comment:
       have you double-checked the root cause of this one? I cannot decide at first sight whether this is a problem or not
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman edited a comment on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
mustafaiman edited a comment on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756725412


   I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime.
   
   > you can still do something like:
   > if compute_bit_vector: -> handle constant parameter
   
   We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either.
   Do you see any other benefit than preserving the earlier q.out outputs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #1824:
URL: https://github.com/apache/hive/pull/1824#discussion_r553419304



##########
File path: ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out
##########
@@ -161,20 +168,28 @@ STAGE PLANS:
                             Map-reduce partition columns: _col0 (type: string), _col1 (type: tinyint)
                             Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: NONE
                             value expressions: _col2 (type: smallint), _col3 (type: smallint), _col4 (type: bigint), _col5 (type: bigint), _col6 (type: binary), _col7 (type: int), _col8 (type: int), _col9 (type: bigint), _col10 (type: binary), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type: binary), _col15 (type: float), _col16 (type: float), _col17 (type: bigint), _col18 (type: binary)
-                      Reduce Output Operator
-                        key expressions: _col4 (type: tinyint)
-                        null sort order: a
-                        sort order: +
-                        Map-reduce partition columns: _col4 (type: tinyint)
-                        Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: NONE
-                        value expressions: _col0 (type: smallint), _col1 (type: int), _col2 (type: bigint), _col3 (type: float)
             Execution mode: llap
             LLAP IO: all inputs
         Reducer 2 
+            Execution mode: llap

Review comment:
       could you please clarify why the patch has moved this operator from Reducer3 to Reducer2? basically, I "like" that we eliminated a reducer stage, however, I don't have an idea whether this is correct not, or what's the root cause behind it? 
   
   as a reference, the history of dynamic partitioning sort optimization (what I can recall) is: HIVE-6455, HIVE-20703, HIVE-23071 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog edited a comment on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog edited a comment on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756195387


   ran through the patches, I'll comment on particular code sections, but first a question in general: is there a specific reason for doing this method transformation?
   ```
   compute_bit_vector(x, 'hll') -> compute_bit_vector_hll(x)
   ```
   while I was reading the qout changes I realized that this rewrite makes this patch so huge, and I'm not sure if it's really worth, as I cannot see the added value, and I personally don't have any problem with the algorithm as a parameter, also what's the default behavior? this would make sense to me:
   ```
   compute_bit_vector(x) -- could be translated to --> compute_bit_vector(x, 'hll') 
   ```
   without added value, e.g. the backport of this patch will be painful if we keep it this way, and the patch is about vectorization coverage, and has nothing to do with compiler basically, what do you think?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756758376


   > I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime.
   > 
   > > you can still do something like:
   > > if compute_bit_vector: -> handle constant parameter
   > 
   > We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either.
   > Do you see any other benefit than preserving the earlier q.out outputs?
   
   no, I'm concerned only about the qout changes
   you're right, if compute_bit_vector is a relatively new thing then we can also ignore backward compatibility problems and go on with compute_bit_vector_hll
   I would personally keep pursuing a smaller patch as having "compute_bit_vector_hll" has no benefits either, but it's up to you, I think if the default hll algo won't be changed in the near future, we can go with the updated qouts :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog edited a comment on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog edited a comment on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756758376


   > I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime.
   > 
   > > you can still do something like:
   > > if compute_bit_vector: -> handle constant parameter
   > 
   > We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either.
   > Do you see any other benefit than preserving the earlier q.out outputs?
   
   no, I'm concerned only about the qout changes
   you're right, if compute_bit_vector is a relatively new thing then we can also ignore backward compatibility problems and go on with compute_bit_vector_hll
   I would personally keep pursuing a smaller patch as having "compute_bit_vector_hll" has no benefits either, but it's up to you, I think if the default hll algo won't be changed in the near future for stats, we can go with the updated qouts :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman commented on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
mustafaiman commented on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756725412


   I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime.
   
   > you can still do something like:
   > if compute_bit_vector: -> handle constant parameter
   We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either.
   Do you see any other benefit than preserving the earlier q.out outputs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman commented on a change in pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
mustafaiman commented on a change in pull request #1824:
URL: https://github.com/apache/hive/pull/1824#discussion_r553168379



##########
File path: ql/src/test/results/clientpositive/llap/auto_sortmerge_join_14.q.out
##########
@@ -194,7 +222,7 @@ STAGE PLANS:
                       keys:
                         0 _col0 (type: int)
                         1 _col0 (type: int)
-                      Statistics: Num rows: 221 Data size: 1768 Basic stats: COMPLETE Column stats: COMPLETE
+                      Statistics: Num rows: 220 Data size: 1760 Basic stats: COMPLETE Column stats: COMPLETE

Review comment:
       Yes, before the patch distinct count on tbl2_n6 was miscalculated. Running the modified test before the patch reveals that distinct count on `key` was calculated wrong.
   ```
   describe formatted tbl2_n6 key;
   select count (distinct key) from tbl2_n6;
   ```
   
   
   ```
   POSTHOOK: Input: default@tbl2_n6
   col_name            	key                 
   data_type           	int                 
   min                 	0                   
   max                 	199                 
   num_nulls           	0                   
   distinct_count      	117                 
   avg_col_len         	                    
   max_col_len         	                    
   num_trues           	                    
   num_falses          	                    
   bit_vector          	HL                  
   comment             	from deserializer   
   COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"key\":\"true\",\"value\":\"true\"}}
   ...
   POSTHOOK: query: select count (distinct key) from tbl2_n6
   ...
   121
   ```
   
   After the patch distinct count is correct. So the stats in the following query is correct now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman commented on a change in pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
mustafaiman commented on a change in pull request #1824:
URL: https://github.com/apache/hive/pull/1824#discussion_r553767009



##########
File path: ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out
##########
@@ -161,20 +168,28 @@ STAGE PLANS:
                             Map-reduce partition columns: _col0 (type: string), _col1 (type: tinyint)
                             Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: NONE
                             value expressions: _col2 (type: smallint), _col3 (type: smallint), _col4 (type: bigint), _col5 (type: bigint), _col6 (type: binary), _col7 (type: int), _col8 (type: int), _col9 (type: bigint), _col10 (type: binary), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type: binary), _col15 (type: float), _col16 (type: float), _col17 (type: bigint), _col18 (type: binary)
-                      Reduce Output Operator
-                        key expressions: _col4 (type: tinyint)
-                        null sort order: a
-                        sort order: +
-                        Map-reduce partition columns: _col4 (type: tinyint)
-                        Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: NONE
-                        value expressions: _col0 (type: smallint), _col1 (type: int), _col2 (type: bigint), _col3 (type: float)
             Execution mode: llap
             LLAP IO: all inputs
         Reducer 2 
+            Execution mode: llap

Review comment:
       Actually we did not eliminate anything. If you look closely, what happened is Reducer 2 and Reducer 3 exchanged names. We still have the same number of mappers/reducers and the same operators overall. I am not sure why the names changed however they are arbitrary anyway. So I am not concerned with that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756195387


   ran through the patches, I'll comment on particular code sections, but first a question in general: is there a specific reason for doing this method transformation?
   ```
   compute_bit_vector(x, 'hll') -> compute_bit_vector_hll(x)
   ```
   while I was reading the qout changes I realized that this rewrite makes this patch so huge, and I'm not sure if it's really worth, as I cannot see the added value, and I personally don't have any problems with the algorithm as a parameter, not the mention the possible fallback to a default maybe?
   ```
   compute_bit_vector(x) -- could be translated to --> compute_bit_vector_hll(x, 'hll') 
   ```
   I would consider this, as, without added value, e.g. the backport of this patch will be painful, and this patch is about vectorization coverage, and has nothing to do with compiler basically, what do you think?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #1824:
URL: https://github.com/apache/hive/pull/1824#discussion_r553769556



##########
File path: ql/src/test/results/clientpositive/llap/auto_sortmerge_join_14.q.out
##########
@@ -194,7 +222,7 @@ STAGE PLANS:
                       keys:
                         0 _col0 (type: int)
                         1 _col0 (type: int)
-                      Statistics: Num rows: 221 Data size: 1768 Basic stats: COMPLETE Column stats: COMPLETE
+                      Statistics: Num rows: 220 Data size: 1760 Basic stats: COMPLETE Column stats: COMPLETE

Review comment:
       cool, this is a nice catch then, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman commented on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
mustafaiman commented on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756579677


   @abstractdog I had to introduce `compute_bit_vector_hll` and `compute_bit_vector_fm` because, vectorization is supported only for single argument functions. compute_bit_vector(arg1, arg2) will not vectorize. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman edited a comment on pull request #1824: HIVE-24510: Vectorize compute_bit_vector

Posted by GitBox <gi...@apache.org>.
mustafaiman edited a comment on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756725412


   I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime.
   
   > you can still do something like:
   > if compute_bit_vector: -> handle constant parameter
   
   We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either.
   Do you see any other benefit than preserving the earlier q.out outputs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org