You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/03/30 17:32:52 UTC

[GitHub] [hive] lcspinter opened a new pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

lcspinter opened a new pull request #2111:
URL: https://github.com/apache/hive/pull/2111


   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603077680



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +118,15 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private boolean isBasicStatProvided;
+    private Map<String, String> providedBasicStats;
 
-    public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
+    public BasicStatsProcessor(Table table, Partish partish, BasicStatsWork work, boolean followedColStats2) {
       this.partish = partish;
       this.work = work;
       followedColStats1 = followedColStats2;
+      providedBasicStats = table.getStorageHandler().getBasicStatistics(work.getTableDesc());

Review comment:
       Should we call this only if the table `isNonNative`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607754506



##########
File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -92,6 +99,11 @@
                   Types.TimestampType.withoutZone(), Types.StringType.get(), Types.BinaryType.get(),
                   Types.DecimalType.of(3, 1), Types.UUIDType.get(), Types.FixedType.ofLength(5),
                   Types.TimeType.get());
+  private static final Map<String, String> STATS_MAPPING = ImmutableMap.of(
+      StatsSetupConst.NUM_FILES, SnapshotSummary.TOTAL_DATA_FILES_PROP,
+      StatsSetupConst.ROW_COUNT, SnapshotSummary.TOTAL_RECORDS_PROP
+      // TODO: add ROW_COUNT -> TOTAL_SIZE mapping after iceberg 0.12 is released

Review comment:
       Fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607675365



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +135,92 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = input -> (Partition) input.result();
+
+    abstract Partish partish();
+    abstract boolean isValid();
+    abstract Object result();
+    abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+    protected String toString(Map<String, String> parameters) {
+      StringBuilder builder = new StringBuilder();

Review comment:
       nit: no strong opinion here, but I usually find it more readable to use streams and then concatenate the results using `Collectors.joining(",");`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on pull request #2111:
URL: https://github.com/apache/hive/pull/2111#issuecomment-812540900


   @pvary @kgyrtkirk Could you please have a second look at this PR? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607660940



##########
File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -826,4 +866,31 @@ private StringBuilder buildComplexTypeInnerQuery(Object field, Type type) {
     }
     return query;
   }
+
+  private void validateBasicStats(Table table, String tableName) {

Review comment:
       Do we need the tableName param here? Can we use table.name()?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on pull request #2111:
URL: https://github.com/apache/hive/pull/2111#issuecomment-813936795


   @marton-bod @szlta If you have time, could you please review this PR? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607759142



##########
File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -826,4 +866,31 @@ private StringBuilder buildComplexTypeInnerQuery(Object field, Type type) {
     }
     return query;
   }
+
+  private void validateBasicStats(Table table, String tableName) {
+    List<Object[]> describeResult = shell.executeStatement("DESCRIBE EXTENDED " + tableName);
+    Optional<Object[]> tableInfo =

Review comment:
       Thanks for letting me know. I changed the test. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607759343



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##########
@@ -197,4 +197,8 @@ default boolean addDynamicSplitPruningEdge(ExprNodeDesc syntheticFilterPredicate
   default Map<String, String> getOperatorDescProperties(OperatorDesc operatorDesc, Map<String, String> initialProps) {
     return initialProps;
   }
+
+  default Map<String, String> getBasicStatistics(TableDesc tableDesc) {

Review comment:
       Right, fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607666001



##########
File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -826,4 +866,31 @@ private StringBuilder buildComplexTypeInnerQuery(Object field, Type type) {
     }
     return query;
   }
+
+  private void validateBasicStats(Table table, String tableName) {
+    List<Object[]> describeResult = shell.executeStatement("DESCRIBE EXTENDED " + tableName);
+    Optional<Object[]> tableInfo =

Review comment:
       Instead of using a `describe` command, wouldn't it be cleaner to load the HMS table and check its HMS params, then we wouldn't need all the parsing. It's done in a similar fashion here: https://github.com/apache/iceberg/blob/master/mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java#L585




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r612552854



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java
##########
@@ -422,7 +422,7 @@ private String extractTableFullName(StatsTask tsk) throws SemanticException {
     TableSpec tableSpec = new TableSpec(table, partitions);
     tableScan.getConf().getTableMetadata().setTableSpec(tableSpec);
 
-    if (BasicStatsNoJobTask.canUseFooterScan(table, inputFormat)) {
+    if (BasicStatsNoJobTask.canUseColumnStats(table, inputFormat)) {

Review comment:
       Right, fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r611489108



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java
##########
@@ -422,7 +422,7 @@ private String extractTableFullName(StatsTask tsk) throws SemanticException {
     TableSpec tableSpec = new TableSpec(table, partitions);
     tableScan.getConf().getTableMetadata().setTableSpec(tableSpec);
 
-    if (BasicStatsNoJobTask.canUseFooterScan(table, inputFormat)) {
+    if (BasicStatsNoJobTask.canUseColumnStats(table, inputFormat)) {

Review comment:
       we have a `BasicStatsNoJobTask.canUseStats` 
   and a `BasicStatsNoJobTask.canUseColumnStats`  - I think "footerscan" is the basicstats stuff ; could this be a typo?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -90,12 +91,21 @@ public BasicStatsNoJobTask(HiveConf conf, BasicStatsNoJobWork work) {
     console = new LogHelper(LOG);
   }
 
-  public static boolean canUseFooterScan(
+  public static boolean canUseStats(
       Table table, Class<? extends InputFormat> inputFormat) {
+      return canUseColumnStats(table, inputFormat) || useBasicStatsFromStorageHandler(table);
+  }
+
+  public static boolean canUseColumnStats(Table table, Class<? extends InputFormat> inputFormat) {

Review comment:
       this has nothing to do with column stats - that's a different thing

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -312,23 +384,23 @@ private int aggregateStats(ExecutorService threadPool, Hive db) {
     return ret;
   }
 
-  private int updatePartitions(Hive db, List<FooterStatCollector> scs, Table table) throws InvalidOperationException, HiveException {
+  private int updatePartitions(Hive db, List<StatCollector> scs, Table table) throws InvalidOperationException, HiveException {
 
     String tableFullName = table.getFullyQualifiedName();
 
     if (scs.isEmpty()) {
       return 0;
     }
     if (work.isStatsReliable()) {

Review comment:
       note: it might make sense to somehow communicate this `work.isStatsReliable` somehow to the `StatCollector` so it can make that `LOG` entry if it has to...

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +129,83 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = sc -> (Partition) sc.result();
+
+    abstract Partish partish();
+    abstract boolean isValid();
+    abstract Object result();
+    abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+    protected String toString(Map<String, String> parameters) {
+      return StatsSetupConst.SUPPORTED_STATS.stream().map(st -> st + "=" + parameters.get(st))
+          .collect(Collectors.joining(", "));
+    }
   }
 
-  static class FooterStatCollector implements Runnable {
+  static class HiveStorageHandlerStatCollector extends StatCollector {
+
+    public HiveStorageHandlerStatCollector(Partish partish) {
+      this.partish = partish;
+    }
+
+    @Override
+    public void init(HiveConf conf, LogHelper console) throws IOException {
+      this.console = console;
+    }
+
+    @Override
+    public void run() {
+      try {
+        Table table = partish.getTable();
+        Map<String, String> parameters = partish.getPartParameters();
+        TableDesc tableDesc = Utilities.getTableDesc(table);
+        Map<String, String> basicStatistics = table.getStorageHandler().getBasicStatistics(tableDesc);
+
+        StatsSetupConst.setBasicStatsState(parameters, StatsSetupConst.TRUE);

Review comment:
       I don't understand why we make changes to the `Table` when we could be updating infos of a partition as well...
   I guess in case of IceBerg you will not have regular partitions ; so it will probably work for that correctly
   
   I think here you want to change `parameters`

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -90,12 +91,21 @@ public BasicStatsNoJobTask(HiveConf conf, BasicStatsNoJobWork work) {
     console = new LogHelper(LOG);
   }
 
-  public static boolean canUseFooterScan(
+  public static boolean canUseStats(

Review comment:
       do we really need 3 methods here? I think we only need 1
   
   and please add "Basic" to its name so that we are "more" clear with it

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +129,83 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = sc -> (Partition) sc.result();
+
+    abstract Partish partish();
+    abstract boolean isValid();
+    abstract Object result();
+    abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+    protected String toString(Map<String, String> parameters) {
+      return StatsSetupConst.SUPPORTED_STATS.stream().map(st -> st + "=" + parameters.get(st))
+          .collect(Collectors.joining(", "));
+    }
   }
 
-  static class FooterStatCollector implements Runnable {
+  static class HiveStorageHandlerStatCollector extends StatCollector {
+
+    public HiveStorageHandlerStatCollector(Partish partish) {
+      this.partish = partish;
+    }
+
+    @Override
+    public void init(HiveConf conf, LogHelper console) throws IOException {
+      this.console = console;
+    }

Review comment:
       make this the default implementation

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +129,83 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = sc -> (Partition) sc.result();
+
+    abstract Partish partish();

Review comment:
       please either add `get` prefix ; or open up field access... these are internal classes
   
   and also implement these methods as `final` 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -265,30 +351,16 @@ private int aggregateStats(ExecutorService threadPool, Hive db) {
 
       Table table = tableSpecs.tableHandle;
 
-      Collection<Partition> partitions = null;
-      if (work.getPartitions() == null || work.getPartitions().isEmpty()) {
-        if (table.isPartitioned()) {
-          partitions = tableSpecs.partitions;
-        }
-      } else {
-        partitions = work.getPartitions();
-      }
+      List<Partish> partishes = getPartishes(table);
 
-      LinkedList<Partish> partishes = Lists.newLinkedList();
-      if (partitions == null) {
-        partishes.add(Partish.buildFor(table));
+      List<StatCollector> scs;
+      if (useBasicStatsFromStorageHandler(table)) {
+        scs = partishes.stream().map(HiveStorageHandlerStatCollector::new).collect(Collectors.toList());

Review comment:
       note: I think if you use a simple loop with a conditional it will be more readable...
   
   the current solution duplicates: `partishes.stream().map` and `.collect(Collectors.toList())`

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +129,83 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = sc -> (Partition) sc.result();
+
+    abstract Partish partish();
+    abstract boolean isValid();
+    abstract Object result();
+    abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+    protected String toString(Map<String, String> parameters) {
+      return StatsSetupConst.SUPPORTED_STATS.stream().map(st -> st + "=" + parameters.get(st))
+          .collect(Collectors.joining(", "));
+    }
   }
 
-  static class FooterStatCollector implements Runnable {
+  static class HiveStorageHandlerStatCollector extends StatCollector {
+
+    public HiveStorageHandlerStatCollector(Partish partish) {
+      this.partish = partish;
+    }
+
+    @Override
+    public void init(HiveConf conf, LogHelper console) throws IOException {
+      this.console = console;
+    }
+
+    @Override
+    public void run() {
+      try {
+        Table table = partish.getTable();
+        Map<String, String> parameters = partish.getPartParameters();
+        TableDesc tableDesc = Utilities.getTableDesc(table);
+        Map<String, String> basicStatistics = table.getStorageHandler().getBasicStatistics(tableDesc);
+
+        StatsSetupConst.setBasicStatsState(parameters, StatsSetupConst.TRUE);
+        parameters.putAll(basicStatistics);

Review comment:
       I don't understand why we need to do this; this will effectively copy all table props from the table object to the partition.
   I guess in case of IceBerg you will not have regular partitions ; so it will probably work for that correctly...but in case there are some this will be just wierd...
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +119,17 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private Map<String, String> providedBasicStats;
 
     public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
       this.partish = partish;
       this.work = work;
       followedColStats1 = followedColStats2;
+      Table table = partish.getTable();
+      if (table.isNonNative()) {
+        TableDesc tableDesc = Utilities.getTableDesc(table);
+        providedBasicStats = table.getStorageHandler().getBasicStatistics(tableDesc);
+      }

Review comment:
       this `Processor` is created for a `Partish` and not for a table - this part will work incorrectly for partitions.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -312,23 +384,23 @@ private int aggregateStats(ExecutorService threadPool, Hive db) {
     return ret;
   }
 
-  private int updatePartitions(Hive db, List<FooterStatCollector> scs, Table table) throws InvalidOperationException, HiveException {
+  private int updatePartitions(Hive db, List<StatCollector> scs, Table table) throws InvalidOperationException, HiveException {
 
     String tableFullName = table.getFullyQualifiedName();
 
     if (scs.isEmpty()) {
       return 0;
     }
     if (work.isStatsReliable()) {
-      for (FooterStatCollector statsCollection : scs) {
-        if (statsCollection.result == null) {
-          LOG.debug("Stats requested to be reliable. Empty stats found: {}", statsCollection.partish.getSimpleName());
+      for (StatCollector statsCollection : scs) {
+        if (statsCollection.result() == null) {

Review comment:
       why do we need to call a method to get `result` ? isn't this the `isValid` method?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -140,7 +147,7 @@ public Object process(StatsAggregator statsAggregator) throws HiveException, Met
         StatsSetupConst.clearColumnStatsState(parameters);
       }
 
-      if (partfileStatus == null) {
+      if (partfileStatus == null && providedBasicStats == null) {

Review comment:
       note: these conditional seem to represent a different path - would be an early return an option?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +119,17 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private Map<String, String> providedBasicStats;
 
     public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
       this.partish = partish;
       this.work = work;
       followedColStats1 = followedColStats2;
+      Table table = partish.getTable();
+      if (table.isNonNative()) {

Review comment:
       what about non-iceberg non-native tables ?
   I think we can count the rows in a non-native table with the basic TS rowcounter
   
   I think you are trying to identify your usecase here by `isNonNative` - I think you would be better off saving your initial decision about how you plan to execute this task in the `work`

##########
File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -153,6 +156,37 @@ public DecomposedPredicate decomposePredicate(JobConf jobConf, Deserializer dese
     return predicate;
   }
 
+  @Override
+  public boolean canProvideBasicStatistics() {
+    return true;
+  }
+
+  @Override
+  public Map<String, String> getBasicStatistics(TableDesc tableDesc) {
+    Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+    Map<String, String> stats = new HashMap<>();
+    if (table.currentSnapshot() != null) {
+      Map<String, String> summary = table.currentSnapshot().summary();
+      if (summary != null) {
+        if (summary.containsKey(SnapshotSummary.TOTAL_DATA_FILES_PROP)) {
+          stats.put(StatsSetupConst.NUM_FILES, summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
+        }
+        if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
+          stats.put(StatsSetupConst.ROW_COUNT, summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+        }
+        // TODO: add TOTAL_SIZE when iceberg 0.12 is released
+        if (summary.containsKey("total-files-size")) {
+          stats.put(StatsSetupConst.TOTAL_SIZE, summary.get("total-files-size"));
+        }
+      }
+    } else {
+      stats.put(StatsSetupConst.NUM_FILES, "0");

Review comment:
       note that right now we don't take opportunities for `ROW_COUNT==0` cases (we threat it as nonexistent stats).
   it seems to me that its already covered in the test - so we will notice this when we decide to improve on it later
   
   note: I think the current api doesn't make it possible to "remove" statistics data if we need to do that..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607658821



##########
File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -92,6 +99,11 @@
                   Types.TimestampType.withoutZone(), Types.StringType.get(), Types.BinaryType.get(),
                   Types.DecimalType.of(3, 1), Types.UUIDType.get(), Types.FixedType.ofLength(5),
                   Types.TimeType.get());
+  private static final Map<String, String> STATS_MAPPING = ImmutableMap.of(
+      StatsSetupConst.NUM_FILES, SnapshotSummary.TOTAL_DATA_FILES_PROP,
+      StatsSetupConst.ROW_COUNT, SnapshotSummary.TOTAL_RECORDS_PROP
+      // TODO: add ROW_COUNT -> TOTAL_SIZE mapping after iceberg 0.12 is released

Review comment:
       small typo: TOTAL_SIZE -> TOTAL_FILE_SIZE_PROP mapping




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610413396



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##########
@@ -197,4 +197,22 @@ default boolean addDynamicSplitPruningEdge(ExprNodeDesc syntheticFilterPredicate
   default Map<String, String> getOperatorDescProperties(OperatorDesc operatorDesc, Map<String, String> initialProps) {
     return initialProps;
   }
+
+  /**
+   * Return some basic statistics (numRows, numFiles, totalSize) calculated by the underlying storage handler
+   * implementation.
+   * @param tableDesc a valid table description, used to load the table
+   * @return map of basic statistics, can be null
+   */
+  default Map<String, String> getBasicStatistics(TableDesc tableDesc) {
+    return null;
+  }
+
+  /**
+   * Check if the storage handler can provide basic statistics.
+   * @return true if the storage handler can supply the basic statistics
+   */
+  default boolean canProvideBasicStatistics() {

Review comment:
       Ok.. I see why it is separated...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607672874



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +135,92 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = input -> (Partition) input.result();

Review comment:
       nit: since we called the StatCollector `sc` in the above lambda, can we rename `input` to `sc` here as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r612551277



##########
File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -153,6 +156,37 @@ public DecomposedPredicate decomposePredicate(JobConf jobConf, Deserializer dese
     return predicate;
   }
 
+  @Override
+  public boolean canProvideBasicStatistics() {
+    return true;
+  }
+
+  @Override
+  public Map<String, String> getBasicStatistics(TableDesc tableDesc) {
+    Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+    Map<String, String> stats = new HashMap<>();
+    if (table.currentSnapshot() != null) {
+      Map<String, String> summary = table.currentSnapshot().summary();
+      if (summary != null) {
+        if (summary.containsKey(SnapshotSummary.TOTAL_DATA_FILES_PROP)) {
+          stats.put(StatsSetupConst.NUM_FILES, summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
+        }
+        if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
+          stats.put(StatsSetupConst.ROW_COUNT, summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+        }
+        // TODO: add TOTAL_SIZE when iceberg 0.12 is released
+        if (summary.containsKey("total-files-size")) {
+          stats.put(StatsSetupConst.TOTAL_SIZE, summary.get("total-files-size"));
+        }
+      }
+    } else {
+      stats.put(StatsSetupConst.NUM_FILES, "0");

Review comment:
       In the case of an empty table, the current snapshot is null. I thought setting all the basic stats to 0 is the right approach since we don't have any data. 
   When the summary of the snapshot is not available I return an empty statistics map. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603420091



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +118,15 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private boolean isBasicStatProvided;
+    private Map<String, String> providedBasicStats;
 
-    public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
+    public BasicStatsProcessor(Table table, Partish partish, BasicStatsWork work, boolean followedColStats2) {

Review comment:
       Removed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603073899



##########
File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -153,6 +156,24 @@ public DecomposedPredicate decomposePredicate(JobConf jobConf, Deserializer dese
     return predicate;
   }
 
+  @Override
+  public Map<String, String> getBasicStatistics(TableDesc tableDesc) {
+    Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+    Map<String, String> summary = table.currentSnapshot().summary();

Review comment:
       Could we check that the summary is always non-null?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607667423



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##########
@@ -197,4 +197,8 @@ default boolean addDynamicSplitPruningEdge(ExprNodeDesc syntheticFilterPredicate
   default Map<String, String> getOperatorDescProperties(OperatorDesc operatorDesc, Map<String, String> initialProps) {
     return initialProps;
   }
+
+  default Map<String, String> getBasicStatistics(TableDesc tableDesc) {

Review comment:
       Since it's a new interface method, can you add some javadoc please?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603074290



##########
File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -153,6 +156,24 @@ public DecomposedPredicate decomposePredicate(JobConf jobConf, Deserializer dese
     return predicate;
   }
 
+  @Override
+  public Map<String, String> getBasicStatistics(TableDesc tableDesc) {
+    Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+    Map<String, String> summary = table.currentSnapshot().summary();
+    Map<String, String> stats = new HashMap<>();
+    if (summary.containsKey(SnapshotSummary.TOTAL_DATA_FILES_PROP)) {
+      stats.put(StatsSetupConst.NUM_FILES, summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
+    }
+    if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
+      stats.put(StatsSetupConst.ROW_COUNT, summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+    }
+    // TODO: add TOTAL_SIZE when iceberg 0.12 is released

Review comment:
       With this TODO, do we need to comment out the code below?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607671652



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -90,12 +91,27 @@ public BasicStatsNoJobTask(HiveConf conf, BasicStatsNoJobWork work) {
     console = new LogHelper(LOG);
   }
 
-  public static boolean canUseFooterScan(
+  public static boolean canUseStats(
       Table table, Class<? extends InputFormat> inputFormat) {
+      return (OrcInputFormat.class.isAssignableFrom(inputFormat) && !AcidUtils.isFullAcidTable(table))
+          || MapredParquetInputFormat.class.isAssignableFrom(inputFormat)
+          || useBasicStatsFromStorageHandler(table);
+  }
+
+  public static boolean canUseColumnStats(Table table, Class<? extends InputFormat> inputFormat) {
     return (OrcInputFormat.class.isAssignableFrom(inputFormat) && !AcidUtils.isFullAcidTable(table))
         || MapredParquetInputFormat.class.isAssignableFrom(inputFormat);
   }
 
+  private static boolean useBasicStatsFromStorageHandler(Table table) {
+    if (table.isNonNative()) {
+      TableDesc tableDesc = Utilities.getTableDesc(table);
+      return table.getStorageHandler().getBasicStatistics(tableDesc) != null;

Review comment:
       Aren't we calculating all the stats here by calling `getBasicStatistics`, just to then discard the results?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603114825



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java
##########
@@ -433,6 +434,12 @@ private String extractTableFullName(StatsTask tsk) throws SemanticException {
       return TaskFactory.get(columnStatsWork);
     } else {
       BasicStatsWork statsWork = new BasicStatsWork(tableScan.getConf().getTableMetadata().getTableSpec());
+      for (MapWork mapWork :  (Collection<MapWork>) currentTask.getMapWork()) {
+        if (mapWork.getAliasToPartnInfo() != null && mapWork.getAliasToPartnInfo().containsKey(table.getTableName())) {

Review comment:
       we have a full table object passed to the `StatsWork` constructor - what's wrong with that?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -258,7 +268,7 @@ private int aggregateStats(Hive db) {
         Partish p;
         partishes.add(p = new Partish.PTable(table));
 
-        BasicStatsProcessor basicStatsProcessor = new BasicStatsProcessor(p, work, conf, followedColStats);
+        BasicStatsProcessor basicStatsProcessor = new BasicStatsProcessor(table, p, work, followedColStats);

Review comment:
       I don't think we need these table callse - you may simply use `Partish#getTable` to get access to the table object later

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +118,15 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private boolean isBasicStatProvided;
+    private Map<String, String> providedBasicStats;
 
-    public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
+    public BasicStatsProcessor(Table table, Partish partish, BasicStatsWork work, boolean followedColStats2) {

Review comment:
       yes; you should use the partish: `Partish.buildFor(table)`
   
   adding a `Table` here will cause confusion...

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +118,15 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private boolean isBasicStatProvided;
+    private Map<String, String> providedBasicStats;
 
-    public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
+    public BasicStatsProcessor(Table table, Partish partish, BasicStatsWork work, boolean followedColStats2) {

Review comment:
       you seem to have added a totally independent conditional  logic to this class; wouldn't it be easier to simply introduce a new `Processor` class for the purpose your are targeting?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +118,15 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private boolean isBasicStatProvided;
+    private Map<String, String> providedBasicStats;
 
-    public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
+    public BasicStatsProcessor(Table table, Partish partish, BasicStatsWork work, boolean followedColStats2) {
       this.partish = partish;
       this.work = work;
       followedColStats1 = followedColStats2;
+      providedBasicStats = table.getStorageHandler().getBasicStatistics(work.getTableDesc());

Review comment:
       I believe you don't need the `tableDesc` for this method -> you will not need to add it to `StatsWork` 

##########
File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -153,6 +156,24 @@ public DecomposedPredicate decomposePredicate(JobConf jobConf, Deserializer dese
     return predicate;
   }
 
+  @Override
+  public Map<String, String> getBasicStatistics(TableDesc tableDesc) {

Review comment:
       this method seem to be using a `TableDesc` to be able to identify the underlying Iceberg table - wouldn't it be possible to do the same from a  `Table` object?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603077185



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +118,15 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private boolean isBasicStatProvided;
+    private Map<String, String> providedBasicStats;
 
-    public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
+    public BasicStatsProcessor(Table table, Partish partish, BasicStatsWork work, boolean followedColStats2) {

Review comment:
       My feeling is that `Partish` should be some general object covering both `Table` and `Partition`. How hard/wasteful would it be to add the required data to the `Partish` object?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607762003



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -90,12 +91,27 @@ public BasicStatsNoJobTask(HiveConf conf, BasicStatsNoJobWork work) {
     console = new LogHelper(LOG);
   }
 
-  public static boolean canUseFooterScan(
+  public static boolean canUseStats(
       Table table, Class<? extends InputFormat> inputFormat) {
+      return (OrcInputFormat.class.isAssignableFrom(inputFormat) && !AcidUtils.isFullAcidTable(table))
+          || MapredParquetInputFormat.class.isAssignableFrom(inputFormat)
+          || useBasicStatsFromStorageHandler(table);
+  }
+
+  public static boolean canUseColumnStats(Table table, Class<? extends InputFormat> inputFormat) {
     return (OrcInputFormat.class.isAssignableFrom(inputFormat) && !AcidUtils.isFullAcidTable(table))
         || MapredParquetInputFormat.class.isAssignableFrom(inputFormat);
   }
 
+  private static boolean useBasicStatsFromStorageHandler(Table table) {
+    if (table.isNonNative()) {
+      TableDesc tableDesc = Utilities.getTableDesc(table);
+      return table.getStorageHandler().getBasicStatistics(tableDesc) != null;

Review comment:
       Introduced a new method on the interface to check whether the storage handler can provide stats. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter closed pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter closed pull request #2111:
URL: https://github.com/apache/hive/pull/2111


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607761311



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +135,92 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = input -> (Partition) input.result();

Review comment:
       This was a legacy code snippet. Fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607758461



##########
File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -826,4 +866,31 @@ private StringBuilder buildComplexTypeInnerQuery(Object field, Type type) {
     }
     return query;
   }
+
+  private void validateBasicStats(Table table, String tableName) {

Review comment:
       I would rather keep the separate tableName param. table.name() returns it in `hive.default.customers` format.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter merged pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter merged pull request #2111:
URL: https://github.com/apache/hive/pull/2111


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603075162



##########
File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -92,6 +98,9 @@
                   Types.TimestampType.withoutZone(), Types.StringType.get(), Types.BinaryType.get(),
                   Types.DecimalType.of(3, 1), Types.UUIDType.get(), Types.FixedType.ofLength(5),
                   Types.TimeType.get());
+  private static final Map<String, String> STATS_MAPPING = ImmutableMap.of(

Review comment:
       Maybe TODO here as well to check the `TOTAL_SIZE` too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r607761504



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +135,92 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = input -> (Partition) input.result();
+
+    abstract Partish partish();
+    abstract boolean isValid();
+    abstract Object result();
+    abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+    protected String toString(Map<String, String> parameters) {
+      StringBuilder builder = new StringBuilder();

Review comment:
       Again, legacy code :), but I changed it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603419311



##########
File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -153,6 +156,24 @@ public DecomposedPredicate decomposePredicate(JobConf jobConf, Deserializer dese
     return predicate;
   }
 
+  @Override
+  public Map<String, String> getBasicStatistics(TableDesc tableDesc) {

Review comment:
       We need the TableDesc, since the properties which are required to load the iceberg table are stored there. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610417708



##########
File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -153,6 +156,37 @@ public DecomposedPredicate decomposePredicate(JobConf jobConf, Deserializer dese
     return predicate;
   }
 
+  @Override
+  public boolean canProvideBasicStatistics() {
+    return true;
+  }
+
+  @Override
+  public Map<String, String> getBasicStatistics(TableDesc tableDesc) {
+    Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+    Map<String, String> stats = new HashMap<>();
+    if (table.currentSnapshot() != null) {
+      Map<String, String> summary = table.currentSnapshot().summary();
+      if (summary != null) {
+        if (summary.containsKey(SnapshotSummary.TOTAL_DATA_FILES_PROP)) {
+          stats.put(StatsSetupConst.NUM_FILES, summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
+        }
+        if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
+          stats.put(StatsSetupConst.ROW_COUNT, summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+        }
+        // TODO: add TOTAL_SIZE when iceberg 0.12 is released
+        if (summary.containsKey("total-files-size")) {
+          stats.put(StatsSetupConst.TOTAL_SIZE, summary.get("total-files-size"));
+        }
+      }
+    } else {
+      stats.put(StatsSetupConst.NUM_FILES, "0");

Review comment:
       Is this for empty table, or when we do not have statistics at hand?
   We might want to handle the situation when we do not have statistics calculated yet, or we have an incomplete table info.
   
   On the Iceberg dev list I have seen this conversation:
   https://mail-archives.apache.org/mod_mbox/iceberg-dev/202104.mbox/%3c9A11ADB4-27D8-40F1-8141-531287C033DE@gmail.com%3e
   
   > So the tldr, Missing is OK, but inaccurate is not




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610410409



##########
File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -92,6 +97,11 @@
                   Types.TimestampType.withoutZone(), Types.StringType.get(), Types.BinaryType.get(),
                   Types.DecimalType.of(3, 1), Types.UUIDType.get(), Types.FixedType.ofLength(5),
                   Types.TimeType.get());
+  private static final Map<String, String> STATS_MAPPING = ImmutableMap.of(

Review comment:
       nit: maybe newline




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610415034



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +129,83 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = sc -> (Partition) sc.result();
+
+    abstract Partish partish();
+    abstract boolean isValid();
+    abstract Object result();
+    abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+    protected String toString(Map<String, String> parameters) {
+      return StatsSetupConst.SUPPORTED_STATS.stream().map(st -> st + "=" + parameters.get(st))
+          .collect(Collectors.joining(", "));
+    }
   }
 
-  static class FooterStatCollector implements Runnable {
+  static class HiveStorageHandlerStatCollector extends StatCollector {
+
+    public HiveStorageHandlerStatCollector(Partish partish) {
+      this.partish = partish;
+    }
+
+    @Override
+    public void init(HiveConf conf, LogHelper console) throws IOException {
+      this.console = console;
+    }
+
+    @Override
+    public void run() {
+      try {
+        Table table = partish.getTable();
+        Map<String, String> parameters = partish.getPartParameters();
+        TableDesc tableDesc = Utilities.getTableDesc(table);
+        Map<String, String> basicStatistics = table.getStorageHandler().getBasicStatistics(tableDesc);

Review comment:
       If the table would be partitioned then this would not provide enough information to the StorageHandler to generated partition related statistics.
   Either we should document it or provide some info to the StorageHandler to calculate partition statistics




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610411498



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##########
@@ -197,4 +197,22 @@ default boolean addDynamicSplitPruningEdge(ExprNodeDesc syntheticFilterPredicate
   default Map<String, String> getOperatorDescProperties(OperatorDesc operatorDesc, Map<String, String> initialProps) {
     return initialProps;
   }
+
+  /**
+   * Return some basic statistics (numRows, numFiles, totalSize) calculated by the underlying storage handler
+   * implementation.
+   * @param tableDesc a valid table description, used to load the table
+   * @return map of basic statistics, can be null
+   */
+  default Map<String, String> getBasicStatistics(TableDesc tableDesc) {
+    return null;
+  }
+
+  /**
+   * Check if the storage handler can provide basic statistics.
+   * @return true if the storage handler can supply the basic statistics
+   */
+  default boolean canProvideBasicStatistics() {

Review comment:
       Do we need both methods?
   Wouldn't it be better to handle `null` from `getBasicStatistics()` as `!canProvideBasicStatistics()`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r612545094



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##########
@@ -119,16 +129,83 @@ public String getName() {
     return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-    Partish partish;
-    Map<String, String> params;
-    Object result;
+  abstract static class StatCollector implements Runnable {
+
+    protected Partish partish;
+    protected Object result;
+    protected LogHelper console;
+
+    public static Function<StatCollector, String> SIMPLE_NAME_FUNCTION =
+        sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+    public static Function<StatCollector, Partition> EXTRACT_RESULT_FUNCTION = sc -> (Partition) sc.result();
+
+    abstract Partish partish();
+    abstract boolean isValid();
+    abstract Object result();
+    abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+    protected String toString(Map<String, String> parameters) {
+      return StatsSetupConst.SUPPORTED_STATS.stream().map(st -> st + "=" + parameters.get(st))
+          .collect(Collectors.joining(", "));
+    }
   }
 
-  static class FooterStatCollector implements Runnable {
+  static class HiveStorageHandlerStatCollector extends StatCollector {
+
+    public HiveStorageHandlerStatCollector(Partish partish) {
+      this.partish = partish;
+    }
+
+    @Override
+    public void init(HiveConf conf, LogHelper console) throws IOException {
+      this.console = console;
+    }
+
+    @Override
+    public void run() {
+      try {
+        Table table = partish.getTable();
+        Map<String, String> parameters = partish.getPartParameters();
+        TableDesc tableDesc = Utilities.getTableDesc(table);
+        Map<String, String> basicStatistics = table.getStorageHandler().getBasicStatistics(tableDesc);

Review comment:
       Correct, I missed that. I will provide the `partish` object which is enough to calculate the table/partition stats on StorageHandler side. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2111: [WIP]HIVE-24928: In case of non-native tables use basic statistics from HiveStorageHandler

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r603420853



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -118,11 +118,15 @@ public String getName() {
     private boolean isMissingAcidState = false;
     private BasicStatsWork work;
     private boolean followedColStats1;
+    private boolean isBasicStatProvided;
+    private Map<String, String> providedBasicStats;
 
-    public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf, boolean followedColStats2) {
+    public BasicStatsProcessor(Table table, Partish partish, BasicStatsWork work, boolean followedColStats2) {

Review comment:
       Per our discussion, I moved my changes to the `BasicStatsNoJobTask`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org