You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/01/28 13:56:05 UTC

[GitHub] [hive] zabetak opened a new pull request #1926: HIVE-23485: Bound GroupByOperator stats using largest NDV among columns

zabetak opened a new pull request #1926:
URL: https://github.com/apache/hive/pull/1926


   ### What changes were proposed in this pull request?
   Update estimations for group by operator to take into account the largest NDV among the columns participating in the aggregation.
   
   ### Why are the changes needed?
   Improve accuracy of statistics.
   
   ### Does this PR introduce _any_ user-facing change?
   May result in plan changes.
   
   ### How was this patch tested?
   ```
   mvn -pl itests/qtest -Pqsplits -Pitests test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite
   mvn -pl itests/qtest -Pqsplits -Pitests test -Dtest=TestTezTPCDS30TBPerfCliDriver -Dtest.output.overwrite
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on pull request #1926: HIVE-23485: Bound GroupByOperator stats using largest NDV among columns

Posted by GitBox <gi...@apache.org>.
zabetak commented on pull request #1926:
URL: https://github.com/apache/hive/pull/1926#issuecomment-769770249


   > I've not seen any changes beyond rowNum diffs in the q.outs
   
   So far I haven't seen major differences either: `rowNums` and as expected `minReductionHashAggr`. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk merged pull request #1926: HIVE-23485: Bound GroupByOperator stats using largest NDV among columns

Posted by GitBox <gi...@apache.org>.
kgyrtkirk merged pull request #1926:
URL: https://github.com/apache/hive/pull/1926


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on pull request #1926: HIVE-23485: Bound GroupByOperator stats using largest NDV among columns

Posted by GitBox <gi...@apache.org>.
zabetak commented on pull request #1926:
URL: https://github.com/apache/hive/pull/1926#issuecomment-770725017


   Hey @kgyrtkirk , tests are green so it would be nice to get this in before conflicts start to emerge again :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on pull request #1926: HIVE-23485: Bound GroupByOperator stats using largest NDV among columns

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on pull request #1926:
URL: https://github.com/apache/hive/pull/1926#issuecomment-770757613


   yeah; I also tried to not forgot it :D
   I hope the changes are still good :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org