You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/02/14 16:41:27 UTC

[GitHub] [hive] kgyrtkirk opened a new pull request #915: HIVE-22893 StatEstimate

kgyrtkirk opened a new pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384358460
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -1590,6 +1610,43 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis
     return colStats;
   }
 
+  private static ColStatistics buildColStatForConstant(HiveConf conf, long numRows, ExprNodeConstantDesc encd) {
+
+    long numNulls = 0;
+    long countDistincts = 0;
+    if (encd.getValue() == null) {
+      // null projection
+      numNulls = numRows;
+    } else {
+      countDistincts = 1;
+    }
+    String colType = encd.getTypeString();
+    colType = colType.toLowerCase();
+    ObjectInspector oi = encd.getWritableObjectInspector();
+    double avgColSize = getAvgColLenOf(conf, oi, colType);
+    ColStatistics colStats = new ColStatistics(encd.getName(), colType);
+    colStats.setAvgColLen(avgColSize);
+    colStats.setCountDistint(countDistincts);
+    colStats.setNumNulls(numNulls);
+
+    Optional<Long> value = getLongConstValue(encd);
 
 Review comment:
   The value is only needed to define the colstat range;
   I've also added to interpret float/double which might be usefull later

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384200546
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -1560,6 +1554,32 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis
         }
       }
 
+      if (conf.getBoolVar(ConfVars.HIVE_STATS_USE_UDF_ESTIMATORS)) {
+        Optional<IStatEstimatorProvider> sep = engfd.getGenericUDF().adapt(IStatEstimatorProvider.class);
+        if (sep.isPresent()) {
+          Optional<IStatEstimator> se = sep.get().getStatEstimator();
 
 Review comment:
   Should we assume that if a UDF is an estimator provider, it should provide an estimator, and thus, throw an exception here if we cannot get the estimator?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384360809
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/estimator/StatEstimators.java
 ##########
 @@ -0,0 +1,51 @@
+package org.apache.hadoop.hive.ql.stats.estimator;
+
+import java.util.Optional;
+
+import org.apache.hadoop.hive.ql.plan.ColStatistics;
+
+public class StatEstimators {
 
 Review comment:
   I was thinking that there will be multiple methods/tools needed to implement estimators - however I ended up having only one common thing.
   Right now I don't foresee that it will be beneficial to retain this container class; so I moved the innerclass outside and removed the StatEstimators class.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384201744
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/estimator/StatEstimators.java
 ##########
 @@ -0,0 +1,51 @@
+package org.apache.hadoop.hive.ql.stats.estimator;
+
+import java.util.Optional;
+
+import org.apache.hadoop.hive.ql.plan.ColStatistics;
+
+public class StatEstimators {
 
 Review comment:
   Could we add comments?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384331084
 
 

 ##########
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##########
 @@ -2519,6 +2519,9 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
                     "higher compute cost. (NDV means the number of distinct values.). It only affects the FM-Sketch \n" +
                     "(not the HLL algorithm which is the default), where it computes the number of necessary\n" +
                     " bitvectors to achieve the accuracy."),
+    HIVE_STATS_USE_UDF_ESTIMATORS("hive.stats.use.statestimators", true,
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384203016
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java
 ##########
 @@ -131,4 +137,52 @@ public BytesWritable evaluate(BytesWritable bw, IntWritable pos, IntWritable len
   public BytesWritable evaluate(BytesWritable bw, IntWritable pos){
     return evaluate(bw, pos, maxValue);
   }
+
+  @Override
+  public Optional<IStatEstimator> getStatEstimator() {
+    return Optional.of(new SubStrStatEstimator());
+  }
+
+  private static class SubStrStatEstimator implements IStatEstimator {
+
+    @Override
+    public Optional<ColStatistics> estimate(List<ColStatistics> csList) {
+      ColStatistics cs = csList.get(0).clone();
+
+      // this might bad in a skewed case; consider:
+      // 1 row with 1000 long string
+      // 99 rows with 0 length
+      // orig avg is 10
+      // new avg is 5 (if substr(5)) ; but in reality it will stay ~10
+      Optional<Double> start = getRangeWidth(csList.get(1).getRange());
+      Range startRange = csList.get(1).getRange();
+      if (startRange != null && startRange.minValue != null) {
+        double newAvgColLen = cs.getAvgColLen() - startRange.minValue.doubleValue();
+        if (newAvgColLen > 0) {
+          cs.setAvgColLen(newAvgColLen);
+        }
+
 
 Review comment:
   you can remove \n

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384197923
 
 

 ##########
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##########
 @@ -2519,6 +2519,9 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
                     "higher compute cost. (NDV means the number of distinct values.). It only affects the FM-Sketch \n" +
                     "(not the HLL algorithm which is the default), where it computes the number of necessary\n" +
                     " bitvectors to achieve the accuracy."),
+    HIVE_STATS_USE_UDF_ESTIMATORS("hive.stats.use.statestimators", true,
+        "Statestimators are able to provide more accurate column statistic infos for UDF results."),
 
 Review comment:
   Statestimators -> Estimators

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384342676
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -1560,6 +1554,32 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis
         }
       }
 
+      if (conf.getBoolVar(ConfVars.HIVE_STATS_USE_UDF_ESTIMATORS)) {
+        Optional<IStatEstimatorProvider> sep = engfd.getGenericUDF().adapt(IStatEstimatorProvider.class);
+        if (sep.isPresent()) {
+          Optional<IStatEstimator> se = sep.get().getStatEstimator();
+          if (se.isPresent()) {
+            List<ColStatistics> csList = new ArrayList<ColStatistics>();
+            for (ExprNodeDesc child : engfd.getChildren()) {
+              ColStatistics cs = getColStatisticsFromExpression(conf, parentStats, child);
+              if (cs == null) {
+                break;
+              }
+              csList.add(cs);
+            }
+            if(csList.size() == engfd.getChildren().size()) {
+              Optional<ColStatistics> res = se.get().estimate(csList);
+              if (res.isPresent()) {
 
 Review comment:
   > When could this happen?
   
   For example if an estimator targets only cases in which a specific argument is a constant.
   
   Right now I don't think there are any specific cases in which the estimation might not be available; however I think it's ok for an estimator to decide to not cover some cases...
   instead of a probably missing nullcheck later; the optional is already checked now...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384341029
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -1560,6 +1554,32 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis
         }
       }
 
+      if (conf.getBoolVar(ConfVars.HIVE_STATS_USE_UDF_ESTIMATORS)) {
+        Optional<IStatEstimatorProvider> sep = engfd.getGenericUDF().adapt(IStatEstimatorProvider.class);
+        if (sep.isPresent()) {
+          Optional<IStatEstimator> se = sep.get().getStatEstimator();
 
 Review comment:
   right; I left with this approach in an earlier version of this patch

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384197755
 
 

 ##########
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##########
 @@ -2519,6 +2519,9 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
                     "higher compute cost. (NDV means the number of distinct values.). It only affects the FM-Sketch \n" +
                     "(not the HLL algorithm which is the default), where it computes the number of necessary\n" +
                     " bitvectors to achieve the accuracy."),
+    HIVE_STATS_USE_UDF_ESTIMATORS("hive.stats.use.statestimators", true,
 
 Review comment:
   'hive.stats.use.statestimators' -> 'hive.stats.estimators.enable'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384202325
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/estimator/StatEstimators.java
 ##########
 @@ -0,0 +1,51 @@
+package org.apache.hadoop.hive.ql.stats.estimator;
+
+import java.util.Optional;
+
+import org.apache.hadoop.hive.ql.plan.ColStatistics;
+
+public class StatEstimators {
+
+  public static class WorstStatCombiner {
 
 Review comment:
   Should this implement an interface? (I do not have a strong opinion, just a comment)
   
   Nit. Can the name be changed? DefaultCombiner? Then you can add in the comments that it considers  max upper bounds for the properties, etc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384364200
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/estimator/StatEstimators.java
 ##########
 @@ -0,0 +1,51 @@
+package org.apache.hadoop.hive.ql.stats.estimator;
+
+import java.util.Optional;
+
+import org.apache.hadoop.hive.ql.plan.ColStatistics;
+
+public class StatEstimators {
+
+  public static class WorstStatCombiner {
 
 Review comment:
   The main goal of this class it to provide a way to not repeat almost the same kind of logic in 3 places..(coalesce/if/case).
   I renamed it to `PessimisticStatCombiner` - which might be a better name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk closed pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk closed pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384199270
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -73,6 +74,9 @@
 import org.apache.hadoop.hive.ql.plan.Statistics;
 import org.apache.hadoop.hive.ql.plan.Statistics.State;
 import org.apache.hadoop.hive.ql.stats.BasicStats.Factory;
+import org.apache.hadoop.hive.ql.stats.estimator.IStatEstimator;
 
 Review comment:
   Currently in Hive we do not seem to prefix interfaces with I (maybe we should start doing it and I know other projects do, but I wanted to mention it since it does not follow current convention).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384200980
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -1560,6 +1554,32 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis
         }
       }
 
+      if (conf.getBoolVar(ConfVars.HIVE_STATS_USE_UDF_ESTIMATORS)) {
+        Optional<IStatEstimatorProvider> sep = engfd.getGenericUDF().adapt(IStatEstimatorProvider.class);
+        if (sep.isPresent()) {
+          Optional<IStatEstimator> se = sep.get().getStatEstimator();
+          if (se.isPresent()) {
+            List<ColStatistics> csList = new ArrayList<ColStatistics>();
+            for (ExprNodeDesc child : engfd.getChildren()) {
+              ColStatistics cs = getColStatisticsFromExpression(conf, parentStats, child);
+              if (cs == null) {
+                break;
+              }
+              csList.add(cs);
+            }
+            if(csList.size() == engfd.getChildren().size()) {
+              Optional<ColStatistics> res = se.get().estimate(csList);
+              if (res.isPresent()) {
 
 Review comment:
   When could this happen? Maybe when column stats has anything unexpected? Could we add a comment clarifying it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384330577
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java
 ##########
 @@ -132,9 +133,11 @@ public String generateInitFileName(String toVersion) throws HiveMetaException {
     String initScriptName = INIT_FILE_PREFIX + toVersion + "." +
         dbType + SQL_FILE_EXTENSION;
     // check if the file exists
-    if (!(new File(getMetaStoreScriptDir() + File.separatorChar +
-          initScriptName).exists())) {
-      throw new HiveMetaException("Unknown version specified for initialization: " + toVersion);
+    File file = new File(getMetaStoreScriptDir() + File.separatorChar +
+          initScriptName);
+    if (!(file.exists())) {
 
 Review comment:
   yeah; sure, actually from time to time I was getting some exception from this area - and I couldn't figure out what file it was looking for...might help later

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384197505
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java
 ##########
 @@ -132,9 +133,11 @@ public String generateInitFileName(String toVersion) throws HiveMetaException {
     String initScriptName = INIT_FILE_PREFIX + toVersion + "." +
         dbType + SQL_FILE_EXTENSION;
     // check if the file exists
-    if (!(new File(getMetaStoreScriptDir() + File.separatorChar +
-          initScriptName).exists())) {
-      throw new HiveMetaException("Unknown version specified for initialization: " + toVersion);
+    File file = new File(getMetaStoreScriptDir() + File.separatorChar +
+          initScriptName);
+    if (!(file.exists())) {
 
 Review comment:
   nit. enclosing () for file not needed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384336187
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -73,6 +74,9 @@
 import org.apache.hadoop.hive.ql.plan.Statistics;
 import org.apache.hadoop.hive.ql.plan.Statistics.State;
 import org.apache.hadoop.hive.ql.stats.BasicStats.Factory;
+import org.apache.hadoop.hive.ql.stats.estimator.IStatEstimator;
 
 Review comment:
   There are interfaces which are prefixed with "I" and also there are some which do not not have it...
   We don't have convention for this... but naturally, I think that prefixing interfaces with "I" and abstract classes with "Abstract" is helpfull.
   What I see is that interfaces which are not prefixed with "I" are described that they "are" interfaces in the apidoc...I think that the "I" in the name makes it clear for both the reader and writer whats the topic here; and by that it facilitates it's documentation to contain more naturally what are the contracts of it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384201501
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -1590,6 +1610,43 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis
     return colStats;
   }
 
+  private static ColStatistics buildColStatForConstant(HiveConf conf, long numRows, ExprNodeConstantDesc encd) {
+
+    long numNulls = 0;
+    long countDistincts = 0;
+    if (encd.getValue() == null) {
+      // null projection
+      numNulls = numRows;
+    } else {
+      countDistincts = 1;
+    }
+    String colType = encd.getTypeString();
+    colType = colType.toLowerCase();
+    ObjectInspector oi = encd.getWritableObjectInspector();
+    double avgColSize = getAvgColLenOf(conf, oi, colType);
+    ColStatistics colStats = new ColStatistics(encd.getName(), colType);
+    colStats.setAvgColLen(avgColSize);
+    colStats.setCountDistint(countDistincts);
+    colStats.setNumNulls(numNulls);
+
+    Optional<Long> value = getLongConstValue(encd);
 
 Review comment:
   What about other types? Can we either add them in this patch, or add a comment and create a follow-up?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384759470
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -1560,6 +1554,32 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis
         }
       }
 
+      if (conf.getBoolVar(ConfVars.HIVE_STATS_USE_UDF_ESTIMATORS)) {
+        Optional<IStatEstimatorProvider> sep = engfd.getGenericUDF().adapt(IStatEstimatorProvider.class);
+        if (sep.isPresent()) {
+          Optional<IStatEstimator> se = sep.get().getStatEstimator();
+          if (se.isPresent()) {
+            List<ColStatistics> csList = new ArrayList<ColStatistics>();
+            for (ExprNodeDesc child : engfd.getChildren()) {
+              ColStatistics cs = getColStatisticsFromExpression(conf, parentStats, child);
+              if (cs == null) {
+                break;
+              }
+              csList.add(cs);
+            }
+            if(csList.size() == engfd.getChildren().size()) {
+              Optional<ColStatistics> res = se.get().estimate(csList);
+              if (res.isPresent()) {
 
 Review comment:
   Makes sense... Let's leave a comment in the code (maybe in the interface since it is behavior that every implementation should be aware of? or here?).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384758044
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
 ##########
 @@ -73,6 +74,9 @@
 import org.apache.hadoop.hive.ql.plan.Statistics;
 import org.apache.hadoop.hive.ql.plan.Statistics.State;
 import org.apache.hadoop.hive.ql.stats.BasicStats.Factory;
+import org.apache.hadoop.hive.ql.stats.estimator.IStatEstimator;
 
 Review comment:
   Sounds good.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #915: HIVE-22893 StatEstimate
URL: https://github.com/apache/hive/pull/915#discussion_r384760129
 
 

 ##########
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##########
 @@ -2519,8 +2519,9 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
                     "higher compute cost. (NDV means the number of distinct values.). It only affects the FM-Sketch \n" +
                     "(not the HLL algorithm which is the default), where it computes the number of necessary\n" +
                     " bitvectors to achieve the accuracy."),
-    HIVE_STATS_USE_UDF_ESTIMATORS("hive.stats.use.statestimators", true,
-        "Statestimators are able to provide more accurate column statistic infos for UDF results."),
+    //FIXME
 
 Review comment:
   FIXME not needed anymore ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org