You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/02/01 07:54:00 UTC
[jira] [Work logged] (HIVE-27000) Improve the modularity of the *ColumnStatsMerger classes
[ https://issues.apache.org/jira/browse/HIVE-27000?focusedWorklogId=842789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-842789 ]
ASF GitHub Bot logged work on HIVE-27000:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Feb/23 07:53
Start Date: 01/Feb/23 07:53
Worklog Time Spent: 10m
Work Description: akshat0395 commented on code in PR #3997:
URL: https://github.com/apache/hive/pull/3997#discussion_r1092866326
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DateColumnStatsMerger.java:
##########
@@ -43,64 +46,57 @@ public void merge(ColumnStatisticsObj aggregateColStats, ColumnStatisticsObj new
DateColumnStatsDataInspector aggregateData = dateInspectorFromStats(aggregateColStats);
DateColumnStatsDataInspector newData = dateInspectorFromStats(newColStats);
- setLowValue(aggregateData, newData);
- setHighValue(aggregateData, newData);
-
- aggregateData.setNumNulls(aggregateData.getNumNulls() + newData.getNumNulls());
- if (aggregateData.getNdvEstimator() == null || newData.getNdvEstimator() == null) {
- aggregateData.setNumDVs(Math.max(aggregateData.getNumDVs(), newData.getNumDVs()));
- } else {
- NumDistinctValueEstimator oldEst = aggregateData.getNdvEstimator();
- NumDistinctValueEstimator newEst = newData.getNdvEstimator();
- final long ndv;
- if (oldEst.canMerge(newEst)) {
- oldEst.mergeEstimators(newEst);
- ndv = oldEst.estimateNumDistinctValues();
- aggregateData.setNdvEstimator(oldEst);
- } else {
- ndv = Math.max(aggregateData.getNumDVs(), newData.getNumDVs());
- }
- LOG.debug("Use bitvector to merge column {}'s ndvs of {} and {} to be {}", aggregateColStats.getColName(),
- aggregateData.getNumDVs(), newData.getNumDVs(), ndv);
- aggregateData.setNumDVs(ndv);
+ Date lowValue = mergeLowValue(getLowValue(aggregateData), getLowValue(newData));
+ if (lowValue != null) {
+ aggregateData.setLowValue(lowValue);
+ }
+ Date highValue = mergeHighValue(getHighValue(aggregateData), getHighValue(newData));
+ if (highValue != null) {
Review Comment:
Thanks @asolimando for the explanation, if there is a plan improve the class hierarchy then it make sense to tackle this then as well.
Issue Time Tracking
-------------------
Worklog Id: (was: 842789)
Time Spent: 1h 20m (was: 1h 10m)
> Improve the modularity of the *ColumnStatsMerger classes
> --------------------------------------------------------
>
> Key: HIVE-27000
> URL: https://issues.apache.org/jira/browse/HIVE-27000
> Project: Hive
> Issue Type: Improvement
> Components: Statistics
> Affects Versions: 4.0.0-alpha-2
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> *ColumnStatsMerger classes contain a lot of duplicate code which is not specific to the data type, and that could therefore be lifted to a common parent class.
> This phenomenon is bound to become even worse if we keep enriching further our supported set of statistics as we did in the context of HIVE-26221.
> The current ticket aims at improving the modularity and code reuse of the *ColumnStatsMerger classes, while improving unit-test coverage to cover all classes and support more use-cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)