You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/02/14 21:40:09 UTC

[GitHub] [hive] zabetak opened a new pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

zabetak opened a new pull request #3027:
URL: https://github.com/apache/hive/pull/3027


   ### What changes were proposed in this pull request?
   Adapt the MR compactor to accept all the properties below:
   * compactor.mapred.job.queue.name
   * compactor.mapreduce.job.queuename
   * compactor.hive.compactor.job.queue
   
   for specifying the job queue per table and per compaction.
   
   Add unit tests defining and guarding the precedence among the aforementioned properties and the different granularity at which a queue can be defined.
   
   ### Why are the changes needed?
   The change restores backward compatibility and also enables the use of the non deprecated MR properties for specifying the job queue for every compaction.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, it restores the old functionality and defines the order on which properties should take effect.
   
   ### How was this patch tested?
   `mvn -pl ql test -Dtest=TestCompactorMRJobQueueConfiguration`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r806677913



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorMRJobQueueConfiguration.java
##########
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.hadoop.hive.common.StringableMap;
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
+import org.apache.hadoop.mapred.JobConf;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.time.LocalDate;
+import java.time.ZoneOffset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Stream;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CompactorMR#createBaseJobConf(HiveConf, String, Table, StorageDescriptor, ValidWriteIdList, CompactionInfo)}.
+ */
+public class TestCompactorMRJobQueueConfiguration {
+
+  @ParameterizedTest
+  @MethodSource("generateBaseJobConfSetup")
+  void testCreateBaseJobConfHasCorrectJobQueue(ConfSetup input) {
+    Table tbl = createPersonTable();
+    tbl.setParameters(input.tableProperties);
+    CompactorMR compactor = new CompactorMR();
+    CompactionInfo ci = new CompactionInfo(tbl.getDbName(), tbl.getTableName(), null, CompactionType.MAJOR);
+    ci.properties = new StringableMap(input.compactionProperties).toString();
+    HiveConf conf = new HiveConf();
+    input.confProperties.forEach(conf::set);
+    JobConf c = compactor.createBaseJobConf(conf, "test-job", tbl, tbl.getSd(), new ValidReaderWriteIdList(), ci);
+    assertEquals(input.expectedQueue, c.getQueueName(), "Test failed for the following input:" + input);
+  }
+
+  private static Stream<ConfSetup> generateBaseJobConfSetup() {
+    List<ConfSetup> inputs = new ArrayList<>();
+    String mrProperty = "mapreduce.job.queuename";

Review comment:
       This mimics the way users pass properties to the compactor APIs. If the APIs change and don't accept these properties anymore ideally I would like these tests to break. In other words, I don't want these tests to continue working after a refactoring to avoid having backward compatibility problems left unnoticed. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r809266279



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -62,31 +75,29 @@ public static ExecutorService createExecutorWithThreadFactory(int parallelism, S
    * @param conf global hive conf
    * @param ci compaction info object
    * @param table instance of table
-   * @return name of the queue, can be null
+   * @return name of the queue
    */
   static String getCompactorJobQueueName(HiveConf conf, CompactionInfo ci, Table table) {
     // Get queue name from the ci. This is passed through
     // ALTER TABLE table_name COMPACT 'major' WITH OVERWRITE TBLPROPERTIES('compactor.hive.compactor.job.queue'='some_queue')
+    List<Function<String, String>> propertyGetters = new ArrayList<>(2);
     if (ci.properties != null) {
       StringableMap ciProperties = new StringableMap(ci.properties);
-      String queueName = ciProperties.get(COMPACTOR_PREFIX + MAPRED_QUEUE_NAME);
-      if (queueName != null && queueName.length() > 0) {
-        return queueName;
-      }
+      propertyGetters.add(ciProperties::get);
     }
-
-    // Get queue name from the table properties
-    String queueName = table.getParameters().get(COMPACTOR_PREFIX + MAPRED_QUEUE_NAME);
-    if (queueName != null && queueName.length() > 0) {
-      return queueName;
+    if (table.getParameters() != null) {
+      propertyGetters.add(table.getParameters()::get);
     }
 
-    // Get queue name from global hive conf
-    queueName = conf.get(HiveConf.ConfVars.COMPACTOR_JOB_QUEUE.varname);
-    if (queueName != null && queueName.length() > 0) {
-      return queueName;
+    for (Function<String, String> getter : propertyGetters) {

Review comment:
       The order in this piece of code is important thus I think the imperative style fits better here. I guess it boils down to personal preferences so unless you feel strongly about this I am inclined to keep it as is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asolimando commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
asolimando commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r806714706



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(

Review comment:
       You are right, in this case consistency can backfire, it's preferable to see test failures in case of a breaking change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r809249306



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -62,31 +75,29 @@ public static ExecutorService createExecutorWithThreadFactory(int parallelism, S
    * @param conf global hive conf
    * @param ci compaction info object
    * @param table instance of table
-   * @return name of the queue, can be null
+   * @return name of the queue
    */
   static String getCompactorJobQueueName(HiveConf conf, CompactionInfo ci, Table table) {
     // Get queue name from the ci. This is passed through
     // ALTER TABLE table_name COMPACT 'major' WITH OVERWRITE TBLPROPERTIES('compactor.hive.compactor.job.queue'='some_queue')
+    List<Function<String, String>> propertyGetters = new ArrayList<>(2);
     if (ci.properties != null) {
       StringableMap ciProperties = new StringableMap(ci.properties);
-      String queueName = ciProperties.get(COMPACTOR_PREFIX + MAPRED_QUEUE_NAME);
-      if (queueName != null && queueName.length() > 0) {
-        return queueName;
-      }
+      propertyGetters.add(ciProperties::get);
     }
-
-    // Get queue name from the table properties
-    String queueName = table.getParameters().get(COMPACTOR_PREFIX + MAPRED_QUEUE_NAME);
-    if (queueName != null && queueName.length() > 0) {
-      return queueName;
+    if (table.getParameters() != null) {
+      propertyGetters.add(table.getParameters()::get);
     }
 
-    // Get queue name from global hive conf
-    queueName = conf.get(HiveConf.ConfVars.COMPACTOR_JOB_QUEUE.varname);
-    if (queueName != null && queueName.length() > 0) {
-      return queueName;
+    for (Function<String, String> getter : propertyGetters) {

Review comment:
       minor, however, it looks like a mix of imperative and declarative styles, I would prefer streaming API here:
   ````
       return propertyGetters.stream()
         .flatMap(getter -> QUEUE_PROPERTIES.stream()
           .filter(p -> !StringUtils.isEmpty(getter.apply(p))))
         .findFirst()
         .orElse(conf.getVar(HiveConf.ConfVars.COMPACTOR_JOB_QUEUE));
   ```` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak closed pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
zabetak closed pull request #3027:
URL: https://github.com/apache/hive/pull/3027


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r809424757



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -62,31 +75,29 @@ public static ExecutorService createExecutorWithThreadFactory(int parallelism, S
    * @param conf global hive conf
    * @param ci compaction info object
    * @param table instance of table
-   * @return name of the queue, can be null
+   * @return name of the queue
    */
   static String getCompactorJobQueueName(HiveConf conf, CompactionInfo ci, Table table) {
     // Get queue name from the ci. This is passed through
     // ALTER TABLE table_name COMPACT 'major' WITH OVERWRITE TBLPROPERTIES('compactor.hive.compactor.job.queue'='some_queue')
+    List<Function<String, String>> propertyGetters = new ArrayList<>(2);
     if (ci.properties != null) {
       StringableMap ciProperties = new StringableMap(ci.properties);
-      String queueName = ciProperties.get(COMPACTOR_PREFIX + MAPRED_QUEUE_NAME);
-      if (queueName != null && queueName.length() > 0) {
-        return queueName;
-      }
+      propertyGetters.add(ciProperties::get);
     }
-
-    // Get queue name from the table properties
-    String queueName = table.getParameters().get(COMPACTOR_PREFIX + MAPRED_QUEUE_NAME);
-    if (queueName != null && queueName.length() > 0) {
-      return queueName;
+    if (table.getParameters() != null) {
+      propertyGetters.add(table.getParameters()::get);
     }
 
-    // Get queue name from global hive conf
-    queueName = conf.get(HiveConf.ConfVars.COMPACTOR_JOB_QUEUE.varname);
-    if (queueName != null && queueName.length() > 0) {
-      return queueName;
+    for (Function<String, String> getter : propertyGetters) {

Review comment:
       that's ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asolimando commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
asolimando commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r806613146



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(
+      "compactor."+HiveConf.ConfVars.COMPACTOR_JOB_QUEUE.varname,

Review comment:
       Nitpick: missing spaces around "+"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asolimando commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
asolimando commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r806616862



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorMRJobQueueConfiguration.java
##########
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.hadoop.hive.common.StringableMap;
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
+import org.apache.hadoop.mapred.JobConf;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.time.LocalDate;
+import java.time.ZoneOffset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Stream;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CompactorMR#createBaseJobConf(HiveConf, String, Table, StorageDescriptor, ValidWriteIdList, CompactionInfo)}.
+ */
+public class TestCompactorMRJobQueueConfiguration {
+
+  @ParameterizedTest
+  @MethodSource("generateBaseJobConfSetup")
+  void testCreateBaseJobConfHasCorrectJobQueue(ConfSetup input) {
+    Table tbl = createPersonTable();
+    tbl.setParameters(input.tableProperties);
+    CompactorMR compactor = new CompactorMR();
+    CompactionInfo ci = new CompactionInfo(tbl.getDbName(), tbl.getTableName(), null, CompactionType.MAJOR);
+    ci.properties = new StringableMap(input.compactionProperties).toString();
+    HiveConf conf = new HiveConf();
+    input.confProperties.forEach(conf::set);
+    JobConf c = compactor.createBaseJobConf(conf, "test-job", tbl, tbl.getSd(), new ValidReaderWriteIdList(), ci);
+    assertEquals(input.expectedQueue, c.getQueueName(), "Test failed for the following input:" + input);
+  }
+
+  private static Stream<ConfSetup> generateBaseJobConfSetup() {
+    List<ConfSetup> inputs = new ArrayList<>();
+    String mrProperty = "mapreduce.job.queuename";

Review comment:
       Same comment as before, can't we link this to the string defined in HiveConf?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asolimando commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
asolimando commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r806613804



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(

Review comment:
       Can't we import them from `HiveConf` to be sure we keep in sync if they change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asolimando commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
asolimando commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r806613146



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(
+      "compactor."+HiveConf.ConfVars.COMPACTOR_JOB_QUEUE.varname,

Review comment:
       Nitpick: missing spaces around "+"

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(

Review comment:
       Can't we import them from `HiveConf` to be sure we keep in sync if they change?

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorMRJobQueueConfiguration.java
##########
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.hadoop.hive.common.StringableMap;
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
+import org.apache.hadoop.mapred.JobConf;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.time.LocalDate;
+import java.time.ZoneOffset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Stream;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CompactorMR#createBaseJobConf(HiveConf, String, Table, StorageDescriptor, ValidWriteIdList, CompactionInfo)}.
+ */
+public class TestCompactorMRJobQueueConfiguration {
+
+  @ParameterizedTest
+  @MethodSource("generateBaseJobConfSetup")
+  void testCreateBaseJobConfHasCorrectJobQueue(ConfSetup input) {
+    Table tbl = createPersonTable();
+    tbl.setParameters(input.tableProperties);
+    CompactorMR compactor = new CompactorMR();
+    CompactionInfo ci = new CompactionInfo(tbl.getDbName(), tbl.getTableName(), null, CompactionType.MAJOR);
+    ci.properties = new StringableMap(input.compactionProperties).toString();
+    HiveConf conf = new HiveConf();
+    input.confProperties.forEach(conf::set);
+    JobConf c = compactor.createBaseJobConf(conf, "test-job", tbl, tbl.getSd(), new ValidReaderWriteIdList(), ci);
+    assertEquals(input.expectedQueue, c.getQueueName(), "Test failed for the following input:" + input);
+  }
+
+  private static Stream<ConfSetup> generateBaseJobConfSetup() {
+    List<ConfSetup> inputs = new ArrayList<>();
+    String mrProperty = "mapreduce.job.queuename";

Review comment:
       Same comment as before, can't we link this to the string defined in HiveConf?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(

Review comment:
       You are right, in this case consistency can backfire, it's preferable to see test failures in case of a breaking change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r806672446



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(

Review comment:
       They are MR specific properties so they are not in `HiveConf`. Moreover, I am not sure if we want these to change transparently during a refactoring cause that will probably create backward compatibility problems. What exactly do you have in mind?

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorMRJobQueueConfiguration.java
##########
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.hadoop.hive.common.StringableMap;
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidWriteIdList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
+import org.apache.hadoop.mapred.JobConf;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.time.LocalDate;
+import java.time.ZoneOffset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Stream;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CompactorMR#createBaseJobConf(HiveConf, String, Table, StorageDescriptor, ValidWriteIdList, CompactionInfo)}.
+ */
+public class TestCompactorMRJobQueueConfiguration {
+
+  @ParameterizedTest
+  @MethodSource("generateBaseJobConfSetup")
+  void testCreateBaseJobConfHasCorrectJobQueue(ConfSetup input) {
+    Table tbl = createPersonTable();
+    tbl.setParameters(input.tableProperties);
+    CompactorMR compactor = new CompactorMR();
+    CompactionInfo ci = new CompactionInfo(tbl.getDbName(), tbl.getTableName(), null, CompactionType.MAJOR);
+    ci.properties = new StringableMap(input.compactionProperties).toString();
+    HiveConf conf = new HiveConf();
+    input.confProperties.forEach(conf::set);
+    JobConf c = compactor.createBaseJobConf(conf, "test-job", tbl, tbl.getSd(), new ValidReaderWriteIdList(), ci);
+    assertEquals(input.expectedQueue, c.getQueueName(), "Test failed for the following input:" + input);
+  }
+
+  private static Stream<ConfSetup> generateBaseJobConfSetup() {
+    List<ConfSetup> inputs = new ArrayList<>();
+    String mrProperty = "mapreduce.job.queuename";

Review comment:
       This mimics the way users pass properties to the compactor APIs. If the APIs change and don't accept these properties anymore ideally I would like these tests to break. In other words, I don't want these tests to continue working after a refactoring to avoid having backward compatibility problems left unnoticed. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r809883008



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(
+      "compactor."+HiveConf.ConfVars.COMPACTOR_JOB_QUEUE.varname,

Review comment:
       Good catch, addressed during merge to master.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #3027: HIVE-25947: Compactor job queue cannot be set per table via compactor.mapred.job.queue.name

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #3027:
URL: https://github.com/apache/hive/pull/3027#discussion_r806672446



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java
##########
@@ -22,16 +22,29 @@
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.txn.CompactionInfo;
 
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ForkJoinPool;
 import java.util.concurrent.ForkJoinWorkerThread;
+import java.util.function.Function;
 
 import static java.lang.String.format;
 
 public class CompactorUtil {
   public static final String COMPACTOR = "compactor";
-  static final String COMPACTOR_PREFIX = "compactor.";
-  static final String MAPRED_QUEUE_NAME = "mapred.job.queue.name";
+  /**
+   * List of accepted properties for defining the compactor's job queue.
+   * 
+   * The order is important and defines which property has precedence over the other if multiple properties are defined
+   * at the same time.
+   */
+  private static final List<String> QUEUE_PROPERTIES = Arrays.asList(

Review comment:
       They are MR specific properties so they are not in `HiveConf`. Moreover, I am not sure if we want these to change transparently during a refactoring cause that will probably create backward compatibility problems. What exactly do you have in mind?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org