You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/05/31 07:34:54 UTC

[GitHub] [hive] asinkovits opened a new pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

asinkovits opened a new pull request #2332:
URL: https://github.com/apache/hive/pull/2332


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Metrics should be behind a feature flag unless they collected in AcidMetricsService which is already behind a feature flag.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Subtask is part of the compaction observability initiative.
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Manual tests were conducted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asinkovits commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
asinkovits commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648285067



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -111,7 +111,7 @@ public void run() {
         // so wrap it in a big catch Throwable statement.
         try {
           handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
-          if (metricsEnabled) {
+          if (metricsEnabled && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       yes. nice catch, fixed.

##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -454,6 +454,8 @@ public static ConfVars getMetaConf(String name) {
         "hive.metastore.acidmetrics.check.interval", 300,
         TimeUnit.SECONDS,
         "Time in seconds between acid related metric collection runs."),
+    METASTORE_ACIDMETRICS_EXT_ON("metastore.acidmetrics.ext.on", "hive.metastore.acidmetrics.ext.on", true,
+        "Whether to collect additional acid related metrics outside of the acid metrics service."),

Review comment:
       And/Or HIVE_SERVER2_METRICS_ENABLED. Shall I add both?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -111,7 +111,7 @@ public void run() {
         // so wrap it in a big catch Throwable statement.
         try {
           handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
-          if (metricsEnabled) {
+          if (metricsEnabled && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       fixed.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
     return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
     getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-    updateMetrics(NUM_OBSOLETE_DELTAS,
-        obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
-    updateMetrics(NUM_DELTAS,
-        deltaCache, deltaTopN, deltasThreshold, counters);
-    updateMetrics(NUM_SMALL_DELTAS,
-        smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    if(acidMetricsExtEnabled) {

Review comment:
       yeah.. So first issue is that this metrics is collected in HS2 but the conf (METASTORE_ACIDMETRICS_EXT_ON) for the metrics collection is defined in the MetastoreConf, so I wanted to minimize the exposure of that. 
   Second is in the end I put it here, because it seamed to make the class more resilient. Here is my reasoning: This is a singleton class, so you can access it basically anywhere, but you need to call the init method on it so that it works properly. The acidMetricsExtEnabled flag is indeed set in the init (configure), so if it was not called it will skip this code part which otherwise would throw a NPE.
   But I'm open to a better solution :)

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
     return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
     getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-    updateMetrics(NUM_OBSOLETE_DELTAS,
-        obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
-    updateMetrics(NUM_DELTAS,
-        deltaCache, deltaTopN, deltasThreshold, counters);
-    updateMetrics(NUM_SMALL_DELTAS,
-        smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    if(acidMetricsExtEnabled) {
+      updateMetrics(NUM_OBSOLETE_DELTAS,
+          obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
+      updateMetrics(NUM_DELTAS,
+          deltaCache, deltaTopN, deltasThreshold, counters);
+      updateMetrics(NUM_SMALL_DELTAS,
+          smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    }
   }
 
-  public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec,
-        float deltaPctThreshold, EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats) throws IOException {
-    long baseSize = getBaseSize(dir);
-    int numObsoleteDeltas = getNumObsoleteDeltas(dir, checkThresholdInSec);
+  public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec, float deltaPctThreshold,
+      EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats, Configuration conf) throws IOException {
+    if (MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       This is to minimize the exposure of the MetastoreConf. I've tried to put as many checks into this class as I could.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -206,7 +216,7 @@ public static void backPropagateAcidMetrics(JobConf jobConf, Configuration conf)
   }
 
   public static void close() {
-    if (getInstance() != null) {
+    if (getInstance() != null && getInstance().acidMetricsExtEnabled) {

Review comment:
       the executorService is created in the configure method. So if it was not called this will throw a NPE if I understand.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##########
@@ -120,7 +120,7 @@ public void run() {
         // don't doom the entire thread.
         try {
           handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Initiator.name());
-          if (metricsEnabled) {
+          if (metricsEnabled && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       fixed.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -190,13 +197,16 @@ public static void createCountersForAcidMetrics(TezCounters tezCounters, JobConf
   }
 
   public static void addAcidMetricsToConfObj(EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats, Configuration conf) {
-    deltaFilesStats.forEach((type, value) ->
-        conf.set(type.name(), Joiner.on(",").withKeyValueSeparator("->").join(value))
-    );
+    if (MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       nice, fixed.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -230,23 +240,26 @@ private static long getDirSize(AcidUtils.ParsedDirectory dir, FileSystem fs) thr
       .sum();
   }
 
-  private void configure(HiveConf conf){
-    deltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD);
-    obsoleteDeltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD);
-
-    initMetricsCache(conf);
-    long reportingInterval = HiveConf.getTimeVar(conf,
-        HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS);
-
-    ThreadFactory threadFactory =
-      new ThreadFactoryBuilder()
-        .setDaemon(true)
-        .setNameFormat("DeltaFilesMetricReporter %d")
-        .build();
-    executorService = Executors.newSingleThreadScheduledExecutor(threadFactory);
-    executorService.scheduleAtFixedRate(
-        new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS);
-    LOG.info("Started DeltaFilesMetricReporter thread");
+  private void configure(HiveConf conf) {
+    acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON);
+    if (acidMetricsExtEnabled) {

Review comment:
       Again exposure of the MetastoreConf. It might not be a valid reason though... :D
   And because of my comment in the close method ("the executorService is created in the configure method.")
   we need to track the acidMetricsExtEnabled in this class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asinkovits commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
asinkovits commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r649514661



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -230,23 +240,26 @@ private static long getDirSize(AcidUtils.ParsedDirectory dir, FileSystem fs) thr
       .sum();
   }
 
-  private void configure(HiveConf conf){
-    deltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD);
-    obsoleteDeltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD);
-
-    initMetricsCache(conf);
-    long reportingInterval = HiveConf.getTimeVar(conf,
-        HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS);
-
-    ThreadFactory threadFactory =
-      new ThreadFactoryBuilder()
-        .setDaemon(true)
-        .setNameFormat("DeltaFilesMetricReporter %d")
-        .build();
-    executorService = Executors.newSingleThreadScheduledExecutor(threadFactory);
-    executorService.scheduleAtFixedRate(
-        new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS);
-    LOG.info("Started DeltaFilesMetricReporter thread");
+  private void configure(HiveConf conf) {
+    acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON);
+    if (acidMetricsExtEnabled) {

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asinkovits commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
asinkovits commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r649272168



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -454,6 +454,8 @@ public static ConfVars getMetaConf(String name) {
         "hive.metastore.acidmetrics.check.interval", 300,
         TimeUnit.SECONDS,
         "Time in seconds between acid related metric collection runs."),
+    METASTORE_ACIDMETRICS_EXT_ON("metastore.acidmetrics.ext.on", "hive.metastore.acidmetrics.ext.on", true,
+        "Whether to collect additional acid related metrics outside of the acid metrics service."),

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648937001



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
     return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
     getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-    updateMetrics(NUM_OBSOLETE_DELTAS,
-        obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
-    updateMetrics(NUM_DELTAS,
-        deltaCache, deltaTopN, deltasThreshold, counters);
-    updateMetrics(NUM_SMALL_DELTAS,
-        smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    if(acidMetricsExtEnabled) {

Review comment:
       Ok, you convinced me




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648935819



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -230,23 +240,26 @@ private static long getDirSize(AcidUtils.ParsedDirectory dir, FileSystem fs) thr
       .sum();
   }
 
-  private void configure(HiveConf conf){
-    deltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD);
-    obsoleteDeltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD);
-
-    initMetricsCache(conf);
-    long reportingInterval = HiveConf.getTimeVar(conf,
-        HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS);
-
-    ThreadFactory threadFactory =
-      new ThreadFactoryBuilder()
-        .setDaemon(true)
-        .setNameFormat("DeltaFilesMetricReporter %d")
-        .build();
-    executorService = Executors.newSingleThreadScheduledExecutor(threadFactory);
-    executorService.scheduleAtFixedRate(
-        new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS);
-    LOG.info("Started DeltaFilesMetricReporter thread");
+  private void configure(HiveConf conf) {
+    acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON);
+    if (acidMetricsExtEnabled) {

Review comment:
       This is only executed once when the HS2 starts up, and MSConf is accessed a lot when this happens anyway. I don't think exposing MSConf would be a problem




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp merged pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
klcopp merged pull request #2332:
URL: https://github.com/apache/hive/pull/2332


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648920936



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -454,6 +454,8 @@ public static ConfVars getMetaConf(String name) {
         "hive.metastore.acidmetrics.check.interval", 300,
         TimeUnit.SECONDS,
         "Time in seconds between acid related metric collection runs."),
+    METASTORE_ACIDMETRICS_EXT_ON("metastore.acidmetrics.ext.on", "hive.metastore.acidmetrics.ext.on", true,
+        "Whether to collect additional acid related metrics outside of the acid metrics service."),

Review comment:
       Yes, right:)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp merged pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
klcopp merged pull request #2332:
URL: https://github.com/apache/hive/pull/2332


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asinkovits commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
asinkovits commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r649773216



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
     return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
     getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-    updateMetrics(NUM_OBSOLETE_DELTAS,
-        obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
-    updateMetrics(NUM_DELTAS,
-        deltaCache, deltaTopN, deltasThreshold, counters);
-    updateMetrics(NUM_SMALL_DELTAS,
-        smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    if(acidMetricsExtEnabled) {
+      updateMetrics(NUM_OBSOLETE_DELTAS,
+          obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
+      updateMetrics(NUM_DELTAS,
+          deltaCache, deltaTopN, deltasThreshold, counters);
+      updateMetrics(NUM_SMALL_DELTAS,
+          smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    }
   }
 
-  public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec,
-        float deltaPctThreshold, EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats) throws IOException {
-    long baseSize = getBaseSize(dir);
-    int numObsoleteDeltas = getNumObsoleteDeltas(dir, checkThresholdInSec);
+  public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec, float deltaPctThreshold,
+      EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats, Configuration conf) throws IOException {
+    if (MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       I'm not sure if I can do that, this is a static method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648937961



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
     return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
     getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-    updateMetrics(NUM_OBSOLETE_DELTAS,
-        obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
-    updateMetrics(NUM_DELTAS,
-        deltaCache, deltaTopN, deltasThreshold, counters);
-    updateMetrics(NUM_SMALL_DELTAS,
-        smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    if(acidMetricsExtEnabled) {
+      updateMetrics(NUM_OBSOLETE_DELTAS,
+          obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
+      updateMetrics(NUM_DELTAS,
+          deltaCache, deltaTopN, deltasThreshold, counters);
+      updateMetrics(NUM_SMALL_DELTAS,
+          smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    }
   }
 
-  public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec,
-        float deltaPctThreshold, EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats) throws IOException {
-    long baseSize = getBaseSize(dir);
-    int numObsoleteDeltas = getNumObsoleteDeltas(dir, checkThresholdInSec);
+  public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec, float deltaPctThreshold,
+      EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats, Configuration conf) throws IOException {
+    if (MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       Ok. Btw using acidMetricsExtEnabled would make this more readable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r647567066



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##########
@@ -120,7 +120,7 @@ public void run() {
         // don't doom the entire thread.
         try {
           handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Initiator.name());
-          if (metricsEnabled) {
+          if (metricsEnabled && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       Same as cleaner

##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -454,6 +454,8 @@ public static ConfVars getMetaConf(String name) {
         "hive.metastore.acidmetrics.check.interval", 300,
         TimeUnit.SECONDS,
         "Time in seconds between acid related metric collection runs."),
+    METASTORE_ACIDMETRICS_EXT_ON("metastore.acidmetrics.ext.on", "hive.metastore.acidmetrics.ext.on", true,
+        "Whether to collect additional acid related metrics outside of the acid metrics service."),

Review comment:
       I think these are only enabled if `MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METRICS_ENABLED)==true` , so it would be good to mention that in the description

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
     return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
     getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-    updateMetrics(NUM_OBSOLETE_DELTAS,
-        obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
-    updateMetrics(NUM_DELTAS,
-        deltaCache, deltaTopN, deltasThreshold, counters);
-    updateMetrics(NUM_SMALL_DELTAS,
-        smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    if(acidMetricsExtEnabled) {
+      updateMetrics(NUM_OBSOLETE_DELTAS,
+          obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
+      updateMetrics(NUM_DELTAS,
+          deltaCache, deltaTopN, deltasThreshold, counters);
+      updateMetrics(NUM_SMALL_DELTAS,
+          smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    }
   }
 
-  public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec,
-        float deltaPctThreshold, EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats) throws IOException {
-    long baseSize = getBaseSize(dir);
-    int numObsoleteDeltas = getNumObsoleteDeltas(dir, checkThresholdInSec);
+  public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec, float deltaPctThreshold,
+      EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats, Configuration conf) throws IOException {
+    if (MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       Instead of adding the check here, it makes a bit more sense to add it to these checks in org.apache.hadoop.hive.ql.io.orc.OrcInputFormat#generateSplitsInfo:
   ```
   if (metricsEnabled && directory instanceof AcidDirectory) {
             DeltaFilesMetricReporter.mergeDeltaFilesStats((AcidDirectory) directory, checkThresholdInSec,
                 deltaPctThreshold, deltaFilesStats);
           }
   ```
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -111,7 +111,7 @@ public void run() {
         // so wrap it in a big catch Throwable statement.
         try {
           handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
-          if (metricsEnabled) {
+          if (metricsEnabled && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       I think this is the same logic as `metricsEnabled  = 
   MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METRICS_ENABLED) && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)`
   right?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
     return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
     getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-    updateMetrics(NUM_OBSOLETE_DELTAS,
-        obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters);
-    updateMetrics(NUM_DELTAS,
-        deltaCache, deltaTopN, deltasThreshold, counters);
-    updateMetrics(NUM_SMALL_DELTAS,
-        smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+    if(acidMetricsExtEnabled) {

Review comment:
       It makes more sense to add this check to org.apache.hadoop.hive.ql.exec.tez.TezTask#execute instead, so all the checks are in one place:
   ```
             if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED)) {
               DeltaFilesMetricReporter.getInstance().submit(dagCounters);
             }
   ```

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -230,23 +240,26 @@ private static long getDirSize(AcidUtils.ParsedDirectory dir, FileSystem fs) thr
       .sum();
   }
 
-  private void configure(HiveConf conf){
-    deltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD);
-    obsoleteDeltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD);
-
-    initMetricsCache(conf);
-    long reportingInterval = HiveConf.getTimeVar(conf,
-        HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS);
-
-    ThreadFactory threadFactory =
-      new ThreadFactoryBuilder()
-        .setDaemon(true)
-        .setNameFormat("DeltaFilesMetricReporter %d")
-        .build();
-    executorService = Executors.newSingleThreadScheduledExecutor(threadFactory);
-    executorService.scheduleAtFixedRate(
-        new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS);
-    LOG.info("Started DeltaFilesMetricReporter thread");
+  private void configure(HiveConf conf) {
+    acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON);
+    if (acidMetricsExtEnabled) {

Review comment:
       It would be nicer to include this check in HiveServer2#init here:
   ```
         if (hiveConf.getBoolVar(ConfVars.HIVE_SERVER2_METRICS_ENABLED)) {
           MetricsFactory.init(hiveConf);
           DeltaFilesMetricReporter.init(hiveConf);
         }
   ```

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -190,13 +197,16 @@ public static void createCountersForAcidMetrics(TezCounters tezCounters, JobConf
   }
 
   public static void addAcidMetricsToConfObj(EnumMap<DeltaFilesMetricType, Map<String, Integer>> deltaFilesStats, Configuration conf) {
-    deltaFilesStats.forEach((type, value) ->
-        conf.set(type.name(), Joiner.on(",").withKeyValueSeparator("->").join(value))
-    );
+    if (MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       I guess this doesn't have a check for `HiveConf.getBoolVar(jobConf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED)` because the deltaFilesStats map is empty if metrics are off. I think you can remove this check.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -206,7 +216,7 @@ public static void backPropagateAcidMetrics(JobConf jobConf, Configuration conf)
   }
 
   public static void close() {
-    if (getInstance() != null) {
+    if (getInstance() != null && getInstance().acidMetricsExtEnabled) {

Review comment:
       I don't think this is necessary... if the instance exists it should be shut down, that's all...

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -111,7 +111,7 @@ public void run() {
         // so wrap it in a big catch Throwable statement.
         try {
           handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
-          if (metricsEnabled) {
+          if (metricsEnabled && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
       This is the same logic as 
   `metricsEnabled = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)
   && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METRICS_ENABLED)` (which would be easier to read) right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] klcopp commented on a change in pull request #2332: HIVE-25081: Put metrics collection behind a feature flag

Posted by GitBox <gi...@apache.org>.
klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648933458



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -206,7 +216,7 @@ public static void backPropagateAcidMetrics(JobConf jobConf, Configuration conf)
   }
 
   public static void close() {
-    if (getInstance() != null) {
+    if (getInstance() != null && getInstance().acidMetricsExtEnabled) {

Review comment:
       Oh, then probably null checking the executorService would be best here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org