You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2017/12/14 19:54:42 UTC

[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/19982

    [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

    ## What changes were proposed in this pull request?
    Add a test suite to ensure all the TPC-H queries can be successfully analyzed, optimized and compiled without hitting the max iteration threshold.
    
    ## How was this patch tested?
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark testTPCH

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19982.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19982
    
----
commit 9d52ff0037f0c97390b6bd5e917ab8caa20799b3
Author: gatorsmile <ga...@gmail.com>
Date:   2017-12-14T19:49:20Z

    fix.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    cc @cloud-fan @maropu @kiszk  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84927/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157050980
  
    --- Diff: sql/core/src/test/resources/tpch/q1.sql ---
    @@ -0,0 +1,23 @@
    +-- using default substitutions
    +
    +select
    +	l_returnflag,
    --- End diff --
    
    Can we use space instead of `tab`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    **[Test build #84928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84928/testReport)** for PR 19982 at commit [`fac5fb2`](https://github.com/apache/spark/commit/fac5fb2325a6225820b0f04df85ee2ffd9266b44).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157119941
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, CodeGenerator}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.execution.{SparkPlan, WholeStageCodegenExec}
    +import org.apache.spark.sql.test.SharedSQLContext
    +import org.apache.spark.util.Utils
    +
    +abstract class BenchmarkQueryTest extends QueryTest with SharedSQLContext with BeforeAndAfterAll {
    +
    +  // When Utils.isTesting is true, the RuleExecutor will issue an exception when hitting
    +  // the max iteration of analyzer/optimizer batches.
    +  assert(Utils.isTesting, "spark.testing is not set to true")
    +
    +  /**
    +   * Drop all the tables
    +   */
    +  protected override def afterAll(): Unit = {
    +    try {
    +      // For debugging dump some statistics about how much time was spent in various optimizer rules
    +      logWarning(RuleExecutor.dumpTimeSpent())
    --- End diff --
    
    This is just to give the overall picture how long each rule takes. 
    
    I plan to submit another PR to track which rule takes an effect for a specific query and also record the time cost. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84940/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    LGTM for one minor comment.
    BTW, how about also adding star schema benchmark (many join cases) to check compilation?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    **[Test build #84936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84936/testReport)** for PR 19982 at commit [`9dc4457`](https://github.com/apache/spark/commit/9dc445739016ce4168523840b65c439ddb54a99b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84936/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84928/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157062400
  
    --- Diff: sql/core/src/test/resources/tpch/q1.sql ---
    @@ -0,0 +1,23 @@
    +-- using default substitutions
    +
    +select
    +	l_returnflag,
    --- End diff --
    
    This is the generated SQL. I plan to keep them unchanged.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    **[Test build #84928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84928/testReport)** for PR 19982 at commit [`fac5fb2`](https://github.com/apache/spark/commit/fac5fb2325a6225820b0f04df85ee2ffd9266b44).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    LGTM with one minor comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    **[Test build #84927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84927/testReport)** for PR 19982 at commit [`9d52ff0`](https://github.com/apache/spark/commit/9d52ff0037f0c97390b6bd5e917ab8caa20799b3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    **[Test build #84940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84940/testReport)** for PR 19982 at commit [`2671416`](https://github.com/apache/spark/commit/2671416688ca6275556602b2f1990cd4361b95e6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157113164
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCHQuerySuite.scala ---
    @@ -0,0 +1,122 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.catalyst.util.resourceToString
    +import org.apache.spark.util.Utils
    +
    +/**
    + * This test suite ensures all the TPC-H queries can be successfully analyzed, optimized
    + * and compiled without hitting the max iteration threshold.
    + */
    +class TPCHQuerySuite extends BenchmarkQueryTest {
    +
    +  /**
    +   * Drop all the tables
    +   */
    +  protected override def afterAll(): Unit = {
    --- End diff --
    
    yea we don't need to overwrite it here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157105834
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCHQuerySuite.scala ---
    @@ -0,0 +1,122 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.catalyst.util.resourceToString
    +import org.apache.spark.util.Utils
    +
    +/**
    + * This test suite ensures all the TPC-H queries can be successfully analyzed, optimized
    + * and compiled without hitting the max iteration threshold.
    + */
    +class TPCHQuerySuite extends BenchmarkQueryTest {
    +
    +  /**
    +   * Drop all the tables
    +   */
    +  protected override def afterAll(): Unit = {
    --- End diff --
    
    As same as `TPCDSQuerySuite`, we can use `BenchmarkQueryTest.afterAll`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157108825
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, CodeGenerator}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.execution.{SparkPlan, WholeStageCodegenExec}
    +import org.apache.spark.sql.test.SharedSQLContext
    +import org.apache.spark.util.Utils
    +
    +abstract class BenchmarkQueryTest extends QueryTest with SharedSQLContext with BeforeAndAfterAll {
    +
    +  // When Utils.isTesting is true, the RuleExecutor will issue an exception when hitting
    +  // the max iteration of analyzer/optimizer batches.
    +  assert(Utils.isTesting, "spark.testing is not set to true")
    +
    +  /**
    +   * Drop all the tables
    +   */
    +  protected override def afterAll(): Unit = {
    +    try {
    +      // For debugging dump some statistics about how much time was spent in various optimizer rules
    +      logWarning(RuleExecutor.dumpTimeSpent())
    --- End diff --
    
    Also, is that a bad idea to dump the time for each query?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    **[Test build #84940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84940/testReport)** for PR 19982 at commit [`2671416`](https://github.com/apache/spark/commit/2671416688ca6275556602b2f1990cd4361b95e6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157119974
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, CodeGenerator}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.execution.{SparkPlan, WholeStageCodegenExec}
    +import org.apache.spark.sql.test.SharedSQLContext
    +import org.apache.spark.util.Utils
    +
    +abstract class BenchmarkQueryTest extends QueryTest with SharedSQLContext with BeforeAndAfterAll {
    +
    +  // When Utils.isTesting is true, the RuleExecutor will issue an exception when hitting
    +  // the max iteration of analyzer/optimizer batches.
    +  assert(Utils.isTesting, "spark.testing is not set to true")
    +
    +  /**
    +   * Drop all the tables
    +   */
    +  protected override def afterAll(): Unit = {
    +    try {
    +      // For debugging dump some statistics about how much time was spent in various optimizer rules
    +      logWarning(RuleExecutor.dumpTimeSpent())
    --- End diff --
    
    oh, ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    yea.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157108036
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, CodeGenerator}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.execution.{SparkPlan, WholeStageCodegenExec}
    +import org.apache.spark.sql.test.SharedSQLContext
    +import org.apache.spark.util.Utils
    +
    +abstract class BenchmarkQueryTest extends QueryTest with SharedSQLContext with BeforeAndAfterAll {
    +
    +  // When Utils.isTesting is true, the RuleExecutor will issue an exception when hitting
    +  // the max iteration of analyzer/optimizer batches.
    +  assert(Utils.isTesting, "spark.testing is not set to true")
    +
    +  /**
    +   * Drop all the tables
    +   */
    +  protected override def afterAll(): Unit = {
    +    try {
    +      // For debugging dump some statistics about how much time was spent in various optimizer rules
    +      logWarning(RuleExecutor.dumpTimeSpent())
    --- End diff --
    
    Why warning for debug uses?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    https://github.com/apache/spark/pull/19990


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157119768
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, CodeGenerator}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.execution.{SparkPlan, WholeStageCodegenExec}
    +import org.apache.spark.sql.test.SharedSQLContext
    +import org.apache.spark.util.Utils
    +
    +abstract class BenchmarkQueryTest extends QueryTest with SharedSQLContext with BeforeAndAfterAll {
    +
    +  // When Utils.isTesting is true, the RuleExecutor will issue an exception when hitting
    +  // the max iteration of analyzer/optimizer batches.
    +  assert(Utils.isTesting, "spark.testing is not set to true")
    +
    +  /**
    +   * Drop all the tables
    +   */
    +  protected override def afterAll(): Unit = {
    +    try {
    +      // For debugging dump some statistics about how much time was spent in various optimizer rules
    +      logWarning(RuleExecutor.dumpTimeSpent())
    --- End diff --
    
    If we do use logWarning, the messages will not be shown in the test log.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    ok, I just look forward to the proposal.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    yea, sure. I have much bandwidth now :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157091737
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCHQuerySuite.scala ---
    @@ -0,0 +1,160 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, CodeGenerator}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.catalyst.util.resourceToString
    +import org.apache.spark.sql.execution.{SparkPlan, WholeStageCodegenExec}
    +import org.apache.spark.sql.test.SharedSQLContext
    +import org.apache.spark.util.Utils
    +
    +/**
    + * This test suite ensures all the TPC-H queries can be successfully analyzed, optimized
    + * and compiled without hitting the max iteration threshold.
    + */
    +class TPCHQuerySuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll {
    +
    +  // When Utils.isTesting is true, the RuleExecutor will issue an exception when hitting
    +  // the max iteration of analyzer/optimizer batches.
    +  assert(Utils.isTesting, "spark.testing is not set to true")
    +
    +  /**
    +   * Drop all the tables
    +   */
    +  protected override def afterAll(): Unit = {
    +    try {
    +      // For debugging dump some statistics about how much time was spent in various optimizer rules
    +      logWarning(RuleExecutor.dumpTimeSpent())
    +      spark.sessionState.catalog.reset()
    +    } finally {
    +      super.afterAll()
    +    }
    +  }
    +
    +  override def beforeAll() {
    +    super.beforeAll()
    +    RuleExecutor.resetTime()
    +
    +    sql(
    +      """
    +        |CREATE TABLE `orders` (
    +        |`o_orderkey` BIGINT, `o_custkey` BIGINT, `o_orderstatus` STRING,
    +        |`o_totalprice` DECIMAL(10,0), `o_orderdate` DATE, `o_orderpriority` STRING,
    +        |`o_clerk` STRING, `o_shippriority` INT, `o_comment` STRING)
    +        |USING parquet
    +      """.stripMargin)
    +
    +    sql(
    +      """
    +        |CREATE TABLE `nation` (
    +        |`n_nationkey` BIGINT, `n_name` STRING, `n_regionkey` BIGINT, `n_comment` STRING)
    +        |USING parquet
    +      """.stripMargin)
    +
    +    sql(
    +      """
    +        |CREATE TABLE `region` (
    +        |`r_regionkey` BIGINT, `r_name` STRING, `r_comment` STRING)
    +        |USING parquet
    +      """.stripMargin)
    +
    +    sql(
    +      """
    +        |CREATE TABLE `part` (`p_partkey` BIGINT, `p_name` STRING, `p_mfgr` STRING,
    +        |`p_brand` STRING, `p_type` STRING, `p_size` INT, `p_container` STRING,
    +        |`p_retailprice` DECIMAL(10,0), `p_comment` STRING)
    +        |USING parquet
    +      """.stripMargin)
    +
    +    sql(
    +      """
    +        |CREATE TABLE `partsupp` (`ps_partkey` BIGINT, `ps_suppkey` BIGINT,
    +        |`ps_availqty` INT, `ps_supplycost` DECIMAL(10,0), `ps_comment` STRING)
    +        |USING parquet
    +      """.stripMargin)
    +
    +    sql(
    +      """
    +        |CREATE TABLE `customer` (`c_custkey` BIGINT, `c_name` STRING, `c_address` STRING,
    +        |`c_nationkey` STRING, `c_phone` STRING, `c_acctbal` DECIMAL(10,0),
    +        |`c_mktsegment` STRING, `c_comment` STRING)
    +        |USING parquet
    +      """.stripMargin)
    +
    +    sql(
    +      """
    +        |CREATE TABLE `supplier` (`s_suppkey` BIGINT, `s_name` STRING, `s_address` STRING,
    +        |`s_nationkey` BIGINT, `s_phone` STRING, `s_acctbal` DECIMAL(10,0), `s_comment` STRING)
    +        |USING parquet
    +      """.stripMargin)
    +
    +    sql(
    +      """
    +        |CREATE TABLE `lineitem` (`l_orderkey` BIGINT, `l_partkey` BIGINT, `l_suppkey` BIGINT,
    +        |`l_linenumber` INT, `l_quantity` DECIMAL(10,0), `l_extendedprice` DECIMAL(10,0),
    +        |`l_discount` DECIMAL(10,0), `l_tax` DECIMAL(10,0), `l_returnflag` STRING,
    +        |`l_linestatus` STRING, `l_shipdate` DATE, `l_commitdate` DATE, `l_receiptdate` DATE,
    +        |`l_shipinstruct` STRING, `l_shipmode` STRING, `l_comment` STRING)
    +        |USING parquet
    +      """.stripMargin)
    +  }
    +
    +  val tpchQueries = Seq(
    +    "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11",
    +    "q12", "q13", "q14", "q15", "q16", "q17", "q18", "q19", "q20", "q21", "q22")
    +
    +  private def checkGeneratedCode(plan: SparkPlan): Unit = {
    --- End diff --
    
    How about making a base trait to remove duplicate code between this and `TPCDSQuerySuite`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    @gatorsmile Any progress on this? https://github.com/apache/spark/pull/19982#discussion_r157119941
    After I thought your comment, I came up with collecting metrics for each rule like;
    https://github.com/apache/spark/compare/master...maropu:MetricSpike
    This conflicts with your activity, or this is not acceptable? welcome any comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    @maropu Thanks for your contribution. It looks over engineering. We do not need such complicated solutions for this simple use case. We just need to record them in the log. We are also proposing new APIs for our logs. @jiangxb1987 is working on the design. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    cc @maropu Feel free to submit a PR for adding SSB


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19982#discussion_r157045722
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala ---
    @@ -116,9 +116,10 @@ class TextFileFormat extends TextBasedFileFormat with DataSourceRegister {
         readToUnsafeMem(broadcastedHadoopConf, requiredSchema, textOptions.wholeText)
       }
     
    -  private def readToUnsafeMem(conf: Broadcast[SerializableConfiguration],
    -      requiredSchema: StructType, wholeTextMode: Boolean):
    -  (PartitionedFile) => Iterator[UnsafeRow] = {
    +  private def readToUnsafeMem(
    +      conf: Broadcast[SerializableConfiguration],
    +      requiredSchema: StructType,
    +      wholeTextMode: Boolean): (PartitionedFile) => Iterator[UnsafeRow] = {
    --- End diff --
    
    This is just a style fix of a PR I merged today.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suit...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19982


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Probably, we could get sql queries from here https://github.com/electrum/ssb-dbgen (I don't look into yet though...)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    https://www.cs.umb.edu/~poneil/StarSchemaB.PDF This one?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    **[Test build #84936 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84936/testReport)** for PR 19982 at commit [`9dc4457`](https://github.com/apache/spark/commit/9dc445739016ce4168523840b65c439ddb54a99b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class BenchmarkQueryTest extends QueryTest with SharedSQLContext with BeforeAndAfterAll `
      * `class TPCDSQuerySuite extends BenchmarkQueryTest `
      * `class TPCHQuerySuite extends BenchmarkQueryTest `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19982: [SPARK-22787] [TEST] [SQL] Add a TPC-H query suite

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19982
  
    **[Test build #84927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84927/testReport)** for PR 19982 at commit [`9d52ff0`](https://github.com/apache/spark/commit/9d52ff0037f0c97390b6bd5e917ab8caa20799b3).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org