You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/07/31 17:25:45 UTC

[GitHub] [hive] zabetak opened a new pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

zabetak opened a new pull request #1347:
URL: https://github.com/apache/hive/pull/1347


   ### What changes were proposed in this pull request and why?
   
   1. Use Dockerized postgres metastore with TPC-DS 30TB dump
   2. Use Hive config properties obtained and curated from real-life usages
   3. Allow AbstractCliConfig to override metastore DB type
   4. Rework CorePerfCliDriver to allow pre-initialized metastores
   
   Extract TPCDS specific code to subclass and document appropriately
   clarifying inaccuracies regarding the size of the tables in the javadoc.
   
   Remove system property settings in the initialization of the driver and
   leave in the configuration to set it up if needed. This is necessary to
   be able to use the driver with a preinitialised metastore.
   
   Remove redundant logs in System.err. Logging and throwing an exception
   is an anti-pattern.
   
   Replace assertions with exceptions and improve the messages.
   
   5. Give more meaningful names to TPCDS related CLI configurations
   6. Disable queries 30, 74, 84 with appropriate JIRA reference
   7. Add TPC-DS query plans for the new driver (TestTezTPCDS30TBCliDriver)
   8. Consider hive.current.database property when returning the session's
   current db to avoid prefixing every query with the name of the database
   9. Upgrade postgres JDBC driver to version 42.2.14 to be compatible
   with the docker image used.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   `mvn test -Dtest=TestTezTPCDS30TBCliDriver  -Dtest.output.overwrite`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r465151500



##########
File path: itests/util/src/main/java/org/apache/hadoop/hive/cli/control/AbstractCliConfig.java
##########
@@ -405,4 +409,11 @@ private String getAbsolutePath(String dir) {
     return new File(new File(HIVE_ROOT), dir).getAbsolutePath();
   }
 
+  public String getMetastoreType() {
+    return metastoreType;
+  }
+
+  public void setMetastoreType(String metastoreType) {

Review comment:
       setters should be protected in this class

##########
File path: data/conf/tpcds30tb/tez/hive-site.xml
##########
@@ -0,0 +1,214 @@
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<configuration>
+    <property>
+        <name>hive.in.test</name>
+        <value>true</value>
+        <description>Internal marker for test. Used for masking env-dependent values</description>
+    </property>
+    <property>
+        <name>hive.rpc.query.plan</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.llap.execution.mode</name>
+        <value>only</value>
+    </property>
+    <property>
+        <name>hive.exec.post.hooks</name>
+        <value>
+            org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook
+        </value>
+    </property>
+    <property>
+        <name>hive.zookeeper.kerberos.enabled</name>
+        <value>false</value>
+    </property>
+    <property>
+        <name>hive.privilege.synchronizer</name>
+        <value>false</value>
+    </property>
+    <property>
+        <name>javax.jdo.option.ConnectionDriverName</name>
+        <value>org.postgresql.Driver</value>
+    </property>
+    <property>
+        <name>hive.exec.scratchdir</name>
+        <value>${test.tmp.dir}/scratchdir</value>
+        <description>Scratch space for Hive jobs</description>
+    </property>
+    <property>
+        <name>hive.server2.tez.sessions.init.threads</name>
+        <value>1</value>
+    </property>
+    <property>
+        <name>hive.optimize.dynamic.partition.hashjoin</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.exec.orc.split.strategy</name>
+        <value>BI</value>
+    </property>
+    <property>
+        <name>hive.mapjoin.bucket.cache.size</name>
+        <value>10000</value>
+    </property>
+    <property>
+        <name>hive.current.database</name>
+        <value>tpcds_bin_partitioned_orc_30000</value>
+    </property>
+    <property>
+        <name>hive.optimize.metadataonly</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.execution.mode</name>
+        <value>llap</value>
+    </property>
+    <property>
+        <name>hive.prewarm.numcontainers</name>
+        <value>3</value>
+    </property>
+    <property>
+        <name>hive.tez.dynamic.partition.pruning</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.support.concurrency</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.txn.manager</name>
+        <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
+    </property>
+    <property>
+        <name>hive.llap.kerberos.enabled</name>
+        <value>false</value>
+    </property>
+    <property>
+        <name>hive.load.data.owner</name>
+        <value>hive</value>
+    </property>
+    <property>
+        <name>hive.enforce.sortmergebucketmapjoin</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.limit.optimize.enable</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.auto.convert.sortmerge.join.to.mapjoin</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.txn.acid.dir.cache.duration</name>
+        <value>0</value>
+    </property>
+    <property>
+        <name>hive.optimize.bucketmapjoin</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.tez.container.size</name>
+        <value>4096</value>
+    </property>
+    <property>
+        <name>hive.stats.fetch.bitvector</name>
+        <value>true</value>

Review comment:
       I think we should probably change the defaults for some of these settings

##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBCliDriver.java
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.cli;
+
+import org.apache.hadoop.hive.cli.control.CliAdapter;
+import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
+import org.junit.ClassRule;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TestRule;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
+
+@RunWith(Parameterized.class)
+public class TestTezTPCDS30TBCliDriver {
+
+  static CliAdapter adapter = new CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
+
+  @Parameters(name = "{0}") public static List<Object[]> getParameters() throws Exception {

Review comment:
       nit: newline after annotation

##########
File path: standalone-metastore/pom.xml
##########
@@ -360,7 +360,7 @@
       <dependency>
         <groupId>org.postgresql</groupId>
         <artifactId>postgresql</artifactId>
-        <version>9.3-1102-jdbc41</version>

Review comment:
       please use a property to set the version

##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfCliDriver.java
##########
@@ -34,7 +34,7 @@
 @RunWith(Parameterized.class)
 public class TestTezPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezCustomTPCDSCliConfig(false).getCliAdapter();

Review comment:
       this is changing the naming contract between the configs and the perf
   
   I think because we will have a new set of `PerfCliDriver`s we could just remove the old ones - or retain the conf names as well..

##########
File path: itests/util/src/main/java/org/apache/hadoop/hive/ql/MetaStoreDumpUtility.java
##########
@@ -56,6 +56,8 @@
 
   static final Logger LOG = LoggerFactory.getLogger(MetaStoreDumpUtility.class);
 
+  // TODO: Change the name of the method or document appropriately

Review comment:
       I think we should remove the old stuff ....in that case explanation is also unneccessary :D

##########
File path: ql/src/test/queries/clientpositive/perf/cbo_query74.q
##########
@@ -1,3 +1,4 @@
+--! qt:disabled:HIVE-23963

Review comment:
       this will disable this test in all drivers - in case we will only use the new driver it will be clearer...

##########
File path: itests/qtest/pom.xml
##########
@@ -420,7 +420,7 @@
     <dependency>
       <groupId>org.postgresql</groupId>
       <artifactId>postgresql</artifactId>
-      <version>9.3-1102-jdbc41</version>
+      <version>42.2.14</version>

Review comment:
       could we use a property for this version?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1743,7 +1743,9 @@ public void setLocalMapRedErrors(Map<String, List<String>> localMapRedErrors) {
 
   public String getCurrentDatabase() {
     if (currentDatabase == null) {
-      currentDatabase = DEFAULT_DATABASE_NAME;
+      currentDatabase = sessionConf.getVar(ConfVars.HIVE_CURRENT_DATABASE);

Review comment:
       I don't think we should do this - why do we need this change?

##########
File path: itests/util/src/main/java/org/apache/hadoop/hive/cli/control/TPCDSMetastoreCliDriver.java
##########
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.cli.control;
+
+import org.apache.hadoop.hive.ql.MetaStoreDumpUtility;
+import org.apache.hadoop.hive.ql.QTestSystemProperties;
+
+/**
+ * CliDriver initialising the metastore with basic table and column statistics from various TPC-DS datasets.
+ *
+ * The table statistics do not reflect cardinalities from a specific TPC-DS scale factor (SF). Below we outline the
+ * tables and the TPC-DS SF to which it corresponds.
+ *
+ * Tables in 30TB dataset (SF=30000):
+ * <ul>
+ *  <li>call_center</li>
+ *  <li>catalog_page</li>
+ *  <li>customer</li>
+ *  <li>customer_address</li>
+ *  <li>item</li>
+ *  <li>promotions</li>
+ *  <li>store</li>
+ *  <li>warehouse</li>
+ *  <li>web_page</li>
+ *  <li>web_site</li>
+ * </ul>
+ *
+ * Tables in 200GB dataset (SF=300):
+ * <ul>
+ *   <li>catalog_returns</li>
+ *   <li>catalog_sales</li>
+ *   <li>store_returns</li>
+ *   <li>store_sales</li>
+ *   <li>web_sales</li>
+ * </ul>
+ *
+ * Table in 3GB dataset (SF=3): inventory
+ * Table with zero rows: ship_mode
+ *
+ * The fact that tables are in datasets with different scale factors is most likely a bug but given that this driver
+ * has been in use for quite some time now it may make sense to keep around for a while.
+ */
+public class TPCDSMetastoreCliDriver extends CorePerfCliDriver {

Review comment:
       could we add `Core` to the name of the class?
   
   I think the important thing here is that it works with tpcds - so I would also recommend to remove the "Metastore" keyword from the name of the class

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1743,7 +1743,9 @@ public void setLocalMapRedErrors(Map<String, List<String>> localMapRedErrors) {
 
   public String getCurrentDatabase() {
     if (currentDatabase == null) {
-      currentDatabase = DEFAULT_DATABASE_NAME;
+      currentDatabase = sessionConf.getVar(ConfVars.HIVE_CURRENT_DATABASE);
+      if (StringUtils.isEmpty(currentDatabase))

Review comment:
       nit: braces




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r523414213



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -56,12 +58,22 @@ public int compare(Object[] o1, Object[] o2) {
   public static TestRule cliClassRule = adapter.buildClassRule();
 
   @Rule
-  public TestRule cliTestRule = adapter.buildTestRule();
+  public TestRule cliTestRule = (statement, description) -> new Statement() {

Review comment:
       I think we need it to call `adapter#setUp` and `adapter#tearDown`, don't we? I will try to remove it to see what happens but I think I tried it already.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on pull request #1347:
URL: https://github.com/apache/hive/pull/1347#issuecomment-730233527


   Closed and reopened to trigger tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r467837707



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1743,7 +1743,9 @@ public void setLocalMapRedErrors(Map<String, List<String>> localMapRedErrors) {
 
   public String getCurrentDatabase() {
     if (currentDatabase == null) {
-      currentDatabase = DEFAULT_DATABASE_NAME;
+      currentDatabase = sessionConf.getVar(ConfVars.HIVE_CURRENT_DATABASE);

Review comment:
       The dockerized metastore dump introduced in this PR has all TPC-DS inside `tpcds_bin_partitioned_orc_30000` database (and not in default). The existing queries assume that the tables are in the `default` database so without further changes they cannot run. 
   
   There are various ways to overcome this but I thought that the most elegant was to to exploit the HIVE_CURRENT_DATABASE property since it is already there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525081835



##########
File path: ql/src/test/queries/clientpositive/perf/cbo_query74.q
##########
@@ -1,3 +1,4 @@
+--! qt:disabled:HIVE-23963

Review comment:
       The issue has been fixed but anyways we are using only the new driver now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r469206883



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfCliDriver.java
##########
@@ -34,7 +34,7 @@
 @RunWith(Parameterized.class)
 public class TestTezPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezCustomTPCDSCliConfig(false).getCliAdapter();

Review comment:
       you can do that - but I still don't see why do we need to add the keyword `custom` to the name of the test...it was around for a few years ...now we rename it and add "custom" to its name while there won't be a non-custom version anymore...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522850158



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -56,12 +58,22 @@ public int compare(Object[] o1, Object[] o2) {
   public static TestRule cliClassRule = adapter.buildClassRule();
 
   @Rule
-  public TestRule cliTestRule = adapter.buildTestRule();
+  public TestRule cliTestRule = (statement, description) -> new Statement() {

Review comment:
       The purpose is not to do more but rather do less :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522852179



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1743,7 +1743,9 @@ public void setLocalMapRedErrors(Map<String, List<String>> localMapRedErrors) {
 
   public String getCurrentDatabase() {
     if (currentDatabase == null) {
-      currentDatabase = DEFAULT_DATABASE_NAME;
+      currentDatabase = sessionConf.getVar(ConfVars.HIVE_CURRENT_DATABASE);

Review comment:
       Well I didn't thought of it as a hack since the property is there but for sure I can pick another approach; I will update the PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525082001



##########
File path: itests/util/src/main/java/org/apache/hadoop/hive/ql/MetaStoreDumpUtility.java
##########
@@ -56,6 +56,8 @@
 
   static final Logger LOG = LoggerFactory.getLogger(MetaStoreDumpUtility.class);
 
+  // TODO: Change the name of the method or document appropriately

Review comment:
       Everything is removed!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525343866



##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/PostgresTPCDS.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.dbinstall.rules;
+
+/**
+ * JUnit TestRule for Postgres metastore with TPCDS schema and stat information.
+ */
+public class PostgresTPCDS extends Postgres {
+  @Override
+  public String getDockerImageName() {
+    return "zabetak/postgres-tpcds-metastore:1.1";
+  }
+
+  @Override
+  public String getJdbcUrl() {
+    return "jdbc:postgresql://localhost:5432/metastore";
+  }
+
+  @Override
+  public String getHiveUser() {
+    return "hive";
+  }
+
+  @Override
+  public String getHivePassword() {
+    return "hive";
+  }
+
+  @Override
+  public void install() {
+    // Do not do anything since the postgres container contains a fully initialized

Review comment:
       I think this needs to be updated during patch development to somehow pass the tests when the schema is changed (you cant run the tests against the previous schema if its incompatible)
   
   Lets see if we bump into any issues :)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525173227



##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/PostgresTPCDS.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.dbinstall.rules;
+
+/**
+ * JUnit TestRule for Postgres metastore with TPCDS schema and stat information.
+ */
+public class PostgresTPCDS extends Postgres {
+  @Override
+  public String getDockerImageName() {
+    return "zabetak/postgres-tpcds-metastore:1.1";
+  }
+
+  @Override
+  public String getJdbcUrl() {
+    return "jdbc:postgresql://localhost:5432/metastore";
+  }
+
+  @Override
+  public String getHiveUser() {
+    return "hive";
+  }
+
+  @Override
+  public String getHivePassword() {
+    return "hive";
+  }
+
+  @Override
+  public void install() {
+    // Do not do anything since the postgres container contains a fully initialized

Review comment:
       I'm not sure about this..this will work for sure for now (and for a few day/weeks/months)
   
   ...but suppose we have a metastore schema change - the docker image will retain the old schema.
   for "master" we don't do incremental updates...so people doing MS changes will also need to rebuild tpcds image as well and upload a new image to dockerhub + change the imagename in this test?
   
   this might be cumbersome...but right now I'm still in favor of this patch if the above could "work"

##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java
##########
@@ -63,7 +67,15 @@ public String getInitialJdbcUrl() {
 
   @Override
   public boolean isContainerReady(String logOutput) {
-    return logOutput.contains("database system is ready to accept connections");
+    if (logOutput.contains("PostgreSQL init process complete; ready for start up")) {
+      try (Socket socket = new Socket()) {
+        socket.connect(new InetSocketAddress("localhost", 5432), 1000);

Review comment:
       does this mean that host system may not run a local psql server in case it wants to run this test?

##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java
##########
@@ -63,7 +67,15 @@ public String getInitialJdbcUrl() {
 
   @Override
   public boolean isContainerReady(String logOutput) {
-    return logOutput.contains("database system is ready to accept connections");
+    if (logOutput.contains("PostgreSQL init process complete; ready for start up")) {
+      try (Socket socket = new Socket()) {
+        socket.connect(new InetSocketAddress("localhost", 5432), 1000);

Review comment:
       I see now that this was "like this" before :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525082746



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -56,12 +58,22 @@ public int compare(Object[] o1, Object[] o2) {
   public static TestRule cliClassRule = adapter.buildClassRule();
 
   @Rule
-  public TestRule cliTestRule = adapter.buildTestRule();
+  public TestRule cliTestRule = (statement, description) -> new Statement() {

Review comment:
       It is necessary to keep the rule. I added some clarifications in https://github.com/apache/hive/pull/1347/commits/4aea9c4311c7c4c4b9b57decf4cb9db1137ff9d9.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r517278142



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -17,30 +17,32 @@
  */
 package org.apache.hadoop.hive.cli;
 
-import java.io.File;
-import java.util.Comparator;
-import java.util.List;
-
 import org.apache.hadoop.hive.cli.control.CliAdapter;
 import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
 import org.junit.ClassRule;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.TestRule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
 
 @RunWith(Parameterized.class)
-public class TestTezPerfCliDriver {
+public class TestTezTPCDS30TBPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
 
   @Parameters(name = "{0}")
   public static List<Object[]> getParameters() throws Exception {
     List<Object[]> parameters = adapter.getParameters();
     parameters.sort(new C1());
-    return parameters;
+    return SplitSupport.process(parameters, TestTezTPCDS30TBPerfCliDriver.class, 10);

Review comment:
       I don't think this is really neccessary - does this testcase runs for more than 15 minutes?
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1743,7 +1743,9 @@ public void setLocalMapRedErrors(Map<String, List<String>> localMapRedErrors) {
 
   public String getCurrentDatabase() {
     if (currentDatabase == null) {
-      currentDatabase = DEFAULT_DATABASE_NAME;
+      currentDatabase = sessionConf.getVar(ConfVars.HIVE_CURRENT_DATABASE);

Review comment:
       instead of hacking the system - can't we just put the data inside the docker image under `default` ?
   ...or add a `use xxx` to the init sql - but please don't add something like this to  `SessionState`

##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -56,12 +58,22 @@ public int compare(Object[] o1, Object[] o2) {
   public static TestRule cliClassRule = adapter.buildClassRule();
 
   @Rule
-  public TestRule cliTestRule = adapter.buildTestRule();
+  public TestRule cliTestRule = (statement, description) -> new Statement() {

Review comment:
       I don't think this does anything more than `adapter.buildTestRule`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525080128



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfCliDriver.java
##########
@@ -34,7 +34,7 @@
 @RunWith(Parameterized.class)
 public class TestTezPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezCustomTPCDSCliConfig(false).getCliAdapter();

Review comment:
       Now we have `TestTezTPCDS30TBPerfCliDriver` and `TezTPCDS30TBCliConfig`. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on pull request #1347:
URL: https://github.com/apache/hive/pull/1347#issuecomment-730323909


   this patch also fully removes TestTezPerfConstraints driver - is that because from now on we will run with constraints all the time?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r523414078



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -17,30 +17,32 @@
  */
 package org.apache.hadoop.hive.cli;
 
-import java.io.File;
-import java.util.Comparator;
-import java.util.List;
-
 import org.apache.hadoop.hive.cli.control.CliAdapter;
 import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
 import org.junit.ClassRule;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.TestRule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
 
 @RunWith(Parameterized.class)
-public class TestTezPerfCliDriver {
+public class TestTezTPCDS30TBPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
 
   @Parameters(name = "{0}")
   public static List<Object[]> getParameters() throws Exception {
     List<Object[]> parameters = adapter.getParameters();
     parameters.sort(new C1());
-    return parameters;
+    return SplitSupport.process(parameters, TestTezTPCDS30TBPerfCliDriver.class, 10);

Review comment:
       Sounds good, I will remove it!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r467785069



##########
File path: data/conf/tpcds30tb/tez/hive-site.xml
##########
@@ -0,0 +1,214 @@
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<configuration>
+    <property>
+        <name>hive.in.test</name>
+        <value>true</value>
+        <description>Internal marker for test. Used for masking env-dependent values</description>
+    </property>
+    <property>
+        <name>hive.rpc.query.plan</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.llap.execution.mode</name>
+        <value>only</value>
+    </property>
+    <property>
+        <name>hive.exec.post.hooks</name>
+        <value>
+            org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook
+        </value>
+    </property>
+    <property>
+        <name>hive.zookeeper.kerberos.enabled</name>
+        <value>false</value>
+    </property>
+    <property>
+        <name>hive.privilege.synchronizer</name>
+        <value>false</value>
+    </property>
+    <property>
+        <name>javax.jdo.option.ConnectionDriverName</name>
+        <value>org.postgresql.Driver</value>
+    </property>
+    <property>
+        <name>hive.exec.scratchdir</name>
+        <value>${test.tmp.dir}/scratchdir</value>
+        <description>Scratch space for Hive jobs</description>
+    </property>
+    <property>
+        <name>hive.server2.tez.sessions.init.threads</name>
+        <value>1</value>
+    </property>
+    <property>
+        <name>hive.optimize.dynamic.partition.hashjoin</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.exec.orc.split.strategy</name>
+        <value>BI</value>
+    </property>
+    <property>
+        <name>hive.mapjoin.bucket.cache.size</name>
+        <value>10000</value>
+    </property>
+    <property>
+        <name>hive.current.database</name>
+        <value>tpcds_bin_partitioned_orc_30000</value>
+    </property>
+    <property>
+        <name>hive.optimize.metadataonly</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.execution.mode</name>
+        <value>llap</value>
+    </property>
+    <property>
+        <name>hive.prewarm.numcontainers</name>
+        <value>3</value>
+    </property>
+    <property>
+        <name>hive.tez.dynamic.partition.pruning</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.support.concurrency</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.txn.manager</name>
+        <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
+    </property>
+    <property>
+        <name>hive.llap.kerberos.enabled</name>
+        <value>false</value>
+    </property>
+    <property>
+        <name>hive.load.data.owner</name>
+        <value>hive</value>
+    </property>
+    <property>
+        <name>hive.enforce.sortmergebucketmapjoin</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.limit.optimize.enable</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.auto.convert.sortmerge.join.to.mapjoin</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.txn.acid.dir.cache.duration</name>
+        <value>0</value>
+    </property>
+    <property>
+        <name>hive.optimize.bucketmapjoin</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>hive.tez.container.size</name>
+        <value>4096</value>
+    </property>
+    <property>
+        <name>hive.stats.fetch.bitvector</name>
+        <value>true</value>

Review comment:
       We should but maybe this requires more discussion and I guess it is outside the scope of this PR. We can open the discussion in the dev list or open up a follow-up JIRA. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522866016



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -56,12 +58,22 @@ public int compare(Object[] o1, Object[] o2) {
   public static TestRule cliClassRule = adapter.buildClassRule();
 
   @Rule
-  public TestRule cliTestRule = adapter.buildTestRule();
+  public TestRule cliTestRule = (statement, description) -> new Statement() {

Review comment:
       okay...then its not needed? :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on pull request #1347:
URL: https://github.com/apache/hive/pull/1347#issuecomment-730345183


   > this patch also fully removes TestTezPerfConstraints driver - is that because from now on we will run with constraints all the time?
   
   That's right. If we want to run with and without constraints then I guess we can truncate the respective table ("KEY_CONSTRAITS") didn't try it though. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r524036093



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -1743,7 +1743,9 @@ public void setLocalMapRedErrors(Map<String, List<String>> localMapRedErrors) {
 
   public String getCurrentDatabase() {
     if (currentDatabase == null) {
-      currentDatabase = DEFAULT_DATABASE_NAME;
+      currentDatabase = sessionConf.getVar(ConfVars.HIVE_CURRENT_DATABASE);

Review comment:
       Finally, I put the data in the docker image under the `default` database and reverted the other changes. Check commit https://github.com/apache/hive/pull/1347/commits/0e0edae65b00b7e670daa0628912f6be2857ba42.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522848692



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -17,30 +17,32 @@
  */
 package org.apache.hadoop.hive.cli;
 
-import java.io.File;
-import java.util.Comparator;
-import java.util.List;
-
 import org.apache.hadoop.hive.cli.control.CliAdapter;
 import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
 import org.junit.ClassRule;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.TestRule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
 
 @RunWith(Parameterized.class)
-public class TestTezPerfCliDriver {
+public class TestTezTPCDS30TBPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
 
   @Parameters(name = "{0}")
   public static List<Object[]> getParameters() throws Exception {
     List<Object[]> parameters = adapter.getParameters();
     parameters.sort(new C1());
-    return parameters;
+    return SplitSupport.process(parameters, TestTezTPCDS30TBPerfCliDriver.class, 10);

Review comment:
       If I remember well, it was around 20 minutes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525078835



##########
File path: itests/util/src/main/java/org/apache/hadoop/hive/cli/control/AbstractCliConfig.java
##########
@@ -405,4 +409,11 @@ private String getAbsolutePath(String dir) {
     return new File(new File(HIVE_ROOT), dir).getAbsolutePath();
   }
 
+  public String getMetastoreType() {
+    return metastoreType;
+  }
+
+  public void setMetastoreType(String metastoreType) {

Review comment:
       Fixed in https://github.com/apache/hive/pull/1347/commits/5a70022e61dc7ddaadda42bfa9d5fff224e94145!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522870090



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -17,30 +17,32 @@
  */
 package org.apache.hadoop.hive.cli;
 
-import java.io.File;
-import java.util.Comparator;
-import java.util.List;
-
 import org.apache.hadoop.hive.cli.control.CliAdapter;
 import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
 import org.junit.ClassRule;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.TestRule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
 
 @RunWith(Parameterized.class)
-public class TestTezPerfCliDriver {
+public class TestTezTPCDS30TBPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
 
   @Parameters(name = "{0}")
   public static List<Object[]> getParameters() throws Exception {
     List<Object[]> parameters = adapter.getParameters();
     parameters.sort(new C1());
-    return parameters;
+    return SplitSupport.process(parameters, TestTezTPCDS30TBPerfCliDriver.class, 10);

Review comment:
       20 minutes is also fine...we have quite a few [replication tests above 30 minutes right now](http://ci.hive.apache.org/job/hive-precommit/job/master/337/testReport/org.apache.hadoop.hive.ql.parse/)
   
   
   note that in its current form it would not work because you need to have a separated "N_SPLITS" integer in the class source(like in other clidriver classes) for the splitter to work.
   
   I think we can leave this out for now - 10 splits is a bit too many for a 20 minute test (2 or 3 would fit better)
   in case of this test there is a bit more overhead per execution; because it will probably need to download and launch the metastore.
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak closed pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak closed pull request #1347:
URL: https://github.com/apache/hive/pull/1347


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525082338



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##########
@@ -17,30 +17,32 @@
  */
 package org.apache.hadoop.hive.cli;
 
-import java.io.File;
-import java.util.Comparator;
-import java.util.List;
-
 import org.apache.hadoop.hive.cli.control.CliAdapter;
 import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
 import org.junit.ClassRule;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.TestRule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
 
 @RunWith(Parameterized.class)
-public class TestTezPerfCliDriver {
+public class TestTezTPCDS30TBPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
 
   @Parameters(name = "{0}")
   public static List<Object[]> getParameters() throws Exception {
     List<Object[]> parameters = adapter.getParameters();
     parameters.sort(new C1());
-    return parameters;
+    return SplitSupport.process(parameters, TestTezTPCDS30TBPerfCliDriver.class, 10);

Review comment:
       Done https://github.com/apache/hive/pull/1347/commits/d50a2f6747fc89ba6a84a243c49b896437065fbc




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asfgit closed pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1347:
URL: https://github.com/apache/hive/pull/1347


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525194182



##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/PostgresTPCDS.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.dbinstall.rules;
+
+/**
+ * JUnit TestRule for Postgres metastore with TPCDS schema and stat information.
+ */
+public class PostgresTPCDS extends Postgres {
+  @Override
+  public String getDockerImageName() {
+    return "zabetak/postgres-tpcds-metastore:1.1";
+  }
+
+  @Override
+  public String getJdbcUrl() {
+    return "jdbc:postgresql://localhost:5432/metastore";
+  }
+
+  @Override
+  public String getHiveUser() {
+    return "hive";
+  }
+
+  @Override
+  public String getHivePassword() {
+    return "hive";
+  }
+
+  @Override
+  public void install() {
+    // Do not do anything since the postgres container contains a fully initialized

Review comment:
       I agree that is not the most convenient thing but doable; this is actually what I did to put TPCDS tables under the `default` Hive database. 
   
   DockerHub is configured to automatically build master and tags in https://github.com/zabetak/hive-postgres-metastore so it would be sufficient to commit the change there and change image name in the rule. In most cases I think that it will be sufficient to update the following file: https://github.com/zabetak/hive-postgres-metastore/blob/master/upgrades/3.1.3000-to-4.0.0.sql
   
   Apart from that, we could also explore other options like bringing in the Hive code base the upgrade script that I mentioned above and that way we don't need to rebuild the image for every change in the metastore. On a second though this might be the best option, WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r525081324



##########
File path: itests/util/src/main/java/org/apache/hadoop/hive/cli/control/TPCDSMetastoreCliDriver.java
##########
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.cli.control;
+
+import org.apache.hadoop.hive.ql.MetaStoreDumpUtility;
+import org.apache.hadoop.hive.ql.QTestSystemProperties;
+
+/**
+ * CliDriver initialising the metastore with basic table and column statistics from various TPC-DS datasets.
+ *
+ * The table statistics do not reflect cardinalities from a specific TPC-DS scale factor (SF). Below we outline the
+ * tables and the TPC-DS SF to which it corresponds.
+ *
+ * Tables in 30TB dataset (SF=30000):
+ * <ul>
+ *  <li>call_center</li>
+ *  <li>catalog_page</li>
+ *  <li>customer</li>
+ *  <li>customer_address</li>
+ *  <li>item</li>
+ *  <li>promotions</li>
+ *  <li>store</li>
+ *  <li>warehouse</li>
+ *  <li>web_page</li>
+ *  <li>web_site</li>
+ * </ul>
+ *
+ * Tables in 200GB dataset (SF=300):
+ * <ul>
+ *   <li>catalog_returns</li>
+ *   <li>catalog_sales</li>
+ *   <li>store_returns</li>
+ *   <li>store_sales</li>
+ *   <li>web_sales</li>
+ * </ul>
+ *
+ * Table in 3GB dataset (SF=3): inventory
+ * Table with zero rows: ship_mode
+ *
+ * The fact that tables are in datasets with different scale factors is most likely a bug but given that this driver
+ * has been in use for quite some time now it may make sense to keep around for a while.
+ */
+public class TPCDSMetastoreCliDriver extends CorePerfCliDriver {

Review comment:
       The class no longer exists. We are using `CorePerfCliDriver` as it is. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r468035881



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfCliDriver.java
##########
@@ -34,7 +34,7 @@
 @RunWith(Parameterized.class)
 public class TestTezPerfCliDriver {
 
-  static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new CliConfigs.TezCustomTPCDSCliConfig(false).getCliAdapter();

Review comment:
       Well in that case and if we keep the existing perf drivers (which it might make sense in the short term) it might be better to rename also the test class (e.g., `TestTezCustomTPCDSCliDriver`) to better reflect what we are really testing. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #1347: HIVE-23965: Improve plan regression tests using TPCDS30TB metastore dump and custom configs

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r467782998



##########
File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBCliDriver.java
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.cli;
+
+import org.apache.hadoop.hive.cli.control.CliAdapter;
+import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
+import org.junit.ClassRule;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TestRule;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
+
+@RunWith(Parameterized.class)
+public class TestTezTPCDS30TBCliDriver {
+
+  static CliAdapter adapter = new CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
+
+  @Parameters(name = "{0}") public static List<Object[]> getParameters() throws Exception {

Review comment:
       Thanks! Do we always wrap after annotations (e.g., class and field annotations) ? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org