You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by salilsurendran <gi...@git.apache.org> on 2017/01/20 20:34:55 UTC

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

GitHub user salilsurendran opened a pull request:

    https://github.com/apache/spark/pull/16664

    [SPARK-18120 ][SQL] Call QueryExecutionListener callback methods for \u2026

    \u2026DataFrameWriter methods
    
    ## What changes were proposed in this pull request?
    QueryExecutionListener has two methods onSuccess() and onFailure() that takes a QueryExecution object as a parameter that gets called when a query is executed. It gets called for several of the DataSet methods like take, head, first, collect etc. but doesn't get called for any of the DataFrameWriter methods like saveAsTable, save etc. This commit fixes this issue and makes calls to these two methods from DataFrameWriter output methods.
    Also, added a new property "spark.sql.queryExecutionListeners" that can be used to specify instances of QueryExecutionListeners that should be attached to the SparkSession when the spark application starts up.
    
    ## How was this patch tested?
     Testing was done using unit tests contained in two suites. The unit tests can be executed by :
    test-only *SparkSQLQueryExecutionListenerSuite
    test-only *DataFrameCallbackSuite
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/salilsurendran/spark SPARK-18120

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16664.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16664
    
----
commit 751ded07c7f4e8ab888b3174e3310f603b79ae68
Author: Salil Surendran <sa...@cloudera.com>
Date:   2017-01-20T20:08:12Z

    [SPARK-18120 ][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods
    
    QueryExecutionListener has two methods onSuccess() and onFailure() that takes a QueryExecution object as a parameter that gets called when a query is executed. It gets called for several of the DataSet methods like take, head, first, collect etc. but doesn't get called for any of the DataFrameWriter methods like saveAsTable, save etc. This commit fixes this issue and makes calls to these two methods from DataFrameWriter output methods.
    Also, added a new property "spark.sql.queryExecutionListeners" that can be used to specify instances of QueryExecutionListeners that should be attached to the SparkSession when the spark application starts up. Testing was done using unit tests.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99060523
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,21 @@ object SQLConf {
           .booleanConf
           .createWithDefault(false)
     
    +
    +  val QUERY_EXECUTION_LISTENERS =
    +    ConfigBuilder("spark.sql.queryExecutionListeners")
    +      .doc("QueryExecutionListeners to be attached to the SparkSession")
    +      .stringConf
    +      .toSequence
    +      .createWithDefault(Nil)
    +
       val SESSION_LOCAL_TIMEZONE =
         SQLConfigBuilder("spark.sql.session.timeZone")
           .doc("""The ID of session local timezone, e.g. "GMT", "America/Los_Angeles", etc.""")
           .stringConf
           .createWithDefault(TimeZone.getDefault().getID())
     
    +
    --- End diff --
    
    Nit: Please remove this empty line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99521620
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SparkSQLQueryExecutionListenerSuite.scala ---
    @@ -0,0 +1,149 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterEach
    +import org.scalatest.mock.MockitoSugar
    +
    +import org.apache.spark.{SparkContext, SparkFunSuite}
    +import org.apache.spark.sql.execution.QueryExecution
    +import org.apache.spark.sql.util.{OutputParams, QueryExecutionListener}
    +
    +/**
    + * Test cases for the property 'spark.sql.queryExecutionListeners' that adds the
    + * @see `QueryExecutionListener` to a @see `SparkSession`
    + */
    +class SparkSQLQueryExecutionListenerSuite
    +    extends SparkFunSuite
    +    with MockitoSugar
    +    with BeforeAndAfterEach {
    +
    +  override def afterEach(): Unit = {
    +    SparkSession.clearActiveSession()
    +    SparkSession.clearDefaultSession()
    +    SparkContext.clearActiveContext()
    +  }
    +
    +  test("Creation of SparkContext with non-existent QueryExecutionListener class fails fast") {
    +    intercept[ClassNotFoundException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", "non.existent.QueryExecutionListener")
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    +  }
    +
    +  test("QueryExecutionListener that doesn't have a default constructor fails fast") {
    +    intercept[InstantiationException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", classOf[NoZeroArgConstructorListener].getName)
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    +  }
    +
    +  test("Normal QueryExecutionListeners gets added as listeners") {
    +    val sparkSession = SparkSession
    +      .builder()
    +      .master("local")
    +      .config("mykey", "myvalue")
    +      .config("spark.sql.queryExecutionListeners",
    +        classOf[NormalQueryExecutionListener].getName + " ,"
    +          + classOf[AnotherQueryExecutionListener].getName)
    +      .getOrCreate()
    +    assert(SparkSession.getDefaultSession.isDefined)
    +    assert(NormalQueryExecutionListener.successCount === 0)
    +    assert(NormalQueryExecutionListener.failureCount === 0)
    +    assert(AnotherQueryExecutionListener.successCount === 0)
    +    assert(AnotherQueryExecutionListener.failureCount === 0)
    +    sparkSession.listenerManager.onSuccess("test1", mock[QueryExecution], 0)
    +    assert(NormalQueryExecutionListener.successCount === 1)
    +    assert(NormalQueryExecutionListener.failureCount === 0)
    +    assert(AnotherQueryExecutionListener.successCount === 1)
    +    assert(AnotherQueryExecutionListener.failureCount === 0)
    +    sparkSession.listenerManager.onFailure("test2", mock[QueryExecution], new Exception)
    +    assert(NormalQueryExecutionListener.successCount === 1)
    +    assert(NormalQueryExecutionListener.failureCount === 1)
    +    assert(AnotherQueryExecutionListener.successCount === 1)
    +    assert(AnotherQueryExecutionListener.failureCount === 1)
    +  }
    +}
    +
    +class NoZeroArgConstructorListener(myString: String) extends QueryExecutionListener {
    +
    +  override def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      options: Option[OutputParams]
    +  ): Unit = {}
    +
    +  override def onFailure(
    +      funcName: String,
    +      qe: QueryExecution,
    +      exception: Exception,
    +      options: Option[OutputParams]
    +  ): Unit = {}
    +}
    +
    +class NormalQueryExecutionListener extends QueryExecutionListener {
    +
    +  override def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      options: Option[OutputParams]
    +  ): Unit = { NormalQueryExecutionListener.successCount += 1 }
    +
    +  override def onFailure(
    +      funcName: String,
    +      qe: QueryExecution,
    +      exception: Exception,
    +      options: Option[OutputParams]
    +  ): Unit = { NormalQueryExecutionListener.failureCount += 1 }
    +}
    +
    +object NormalQueryExecutionListener {
    +  var successCount = 0;
    +  var failureCount = 0;
    +}
    +
    +class AnotherQueryExecutionListener extends QueryExecutionListener {
    +
    +  override def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      options: Option[OutputParams]
    +  ): Unit = { AnotherQueryExecutionListener.successCount += 1 }
    +
    +  override def onFailure(
    +      funcName: String,
    +      qe: QueryExecution,
    +      exception: Exception,
    +      options: Option[OutputParams]
    +  ): Unit = { AnotherQueryExecutionListener.failureCount += 1 }
    +}
    +
    +object AnotherQueryExecutionListener {
    +  var successCount = 0;
    +  var failureCount = 0;
    --- End diff --
    
    Nit: no need to use `;` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Could you update the PR title to `[SPARK-18120][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    /cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99386276
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -190,6 +192,32 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       }
     
       /**
    +   * Executes the query and calls the {@link org.apache.spark.sql.util.QueryExecutionListener}
    +   * methods.
    +   *
    +   * @param funcName A identifier for the method executing the query
    +   * @param qe the @see [[QueryExecution]] object associated with the query
    +   * @param outputParams The output parameters useful for query analysis
    +   * @param action the function that executes the query after which the listener methods gets
    +   *               called.
    +   */
    +  private def executeAndCallQEListener(
    --- End diff --
    
    Are you saying renamed the parameter action to withAction?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100565585
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1300,10 +1300,28 @@ Configuration of in-memory caching can be done using the `setConf` method on `Sp
     
     </table>
     
    +## QueryExecutionListener Options
    --- End diff --
    
    this seems like a completely unrelated change to the bug fix.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #71741 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71741/testReport)** for PR 16664 at commit [`751ded0`](https://github.com/apache/spark/commit/751ded07c7f4e8ab888b3174e3310f603b79ae68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    @yhuai @marmbrus  @liancheng Can someone review my PR please. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Yea we should fix that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99061828
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -190,6 +192,32 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       }
     
       /**
    +   * Executes the query and calls the {@link org.apache.spark.sql.util.QueryExecutionListener}
    +   * methods.
    +   *
    +   * @param funcName A identifier for the method executing the query
    +   * @param qe the @see [[QueryExecution]] object associated with the query
    +   * @param outputParams The output parameters useful for query analysis
    +   * @param action the function that executes the query after which the listener methods gets
    +   *               called.
    +   */
    +  private def executeAndCallQEListener(
    --- End diff --
    
    How about renaming it `withAction`? It is more consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99521699
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SparkSQLQueryExecutionListenerSuite.scala ---
    @@ -0,0 +1,149 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterEach
    +import org.scalatest.mock.MockitoSugar
    +
    +import org.apache.spark.{SparkContext, SparkFunSuite}
    +import org.apache.spark.sql.execution.QueryExecution
    +import org.apache.spark.sql.util.{OutputParams, QueryExecutionListener}
    +
    +/**
    + * Test cases for the property 'spark.sql.queryExecutionListeners' that adds the
    + * @see `QueryExecutionListener` to a @see `SparkSession`
    + */
    +class SparkSQLQueryExecutionListenerSuite
    +    extends SparkFunSuite
    +    with MockitoSugar
    +    with BeforeAndAfterEach {
    +
    +  override def afterEach(): Unit = {
    +    SparkSession.clearActiveSession()
    +    SparkSession.clearDefaultSession()
    +    SparkContext.clearActiveContext()
    +  }
    +
    +  test("Creation of SparkContext with non-existent QueryExecutionListener class fails fast") {
    +    intercept[ClassNotFoundException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", "non.existent.QueryExecutionListener")
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    --- End diff --
    
    The same here. `isEmpty`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    @cloud-fan From what I understand we need to modify InsertXXX command to carry all the write options instead of the change suggested in this PR. Right now the QueryExecution object doesn't carry any of the output options. Am I correct? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    @marmbrus `DataStreamWriter` has similar issues, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99521628
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SparkSQLQueryExecutionListenerSuite.scala ---
    @@ -0,0 +1,149 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterEach
    +import org.scalatest.mock.MockitoSugar
    +
    +import org.apache.spark.{SparkContext, SparkFunSuite}
    +import org.apache.spark.sql.execution.QueryExecution
    +import org.apache.spark.sql.util.{OutputParams, QueryExecutionListener}
    +
    +/**
    + * Test cases for the property 'spark.sql.queryExecutionListeners' that adds the
    + * @see `QueryExecutionListener` to a @see `SparkSession`
    + */
    +class SparkSQLQueryExecutionListenerSuite
    +    extends SparkFunSuite
    +    with MockitoSugar
    +    with BeforeAndAfterEach {
    +
    +  override def afterEach(): Unit = {
    +    SparkSession.clearActiveSession()
    +    SparkSession.clearDefaultSession()
    +    SparkContext.clearActiveContext()
    +  }
    +
    +  test("Creation of SparkContext with non-existent QueryExecutionListener class fails fast") {
    +    intercept[ClassNotFoundException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", "non.existent.QueryExecutionListener")
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    +  }
    +
    +  test("QueryExecutionListener that doesn't have a default constructor fails fast") {
    +    intercept[InstantiationException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", classOf[NoZeroArgConstructorListener].getName)
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    +  }
    +
    +  test("Normal QueryExecutionListeners gets added as listeners") {
    +    val sparkSession = SparkSession
    +      .builder()
    +      .master("local")
    +      .config("mykey", "myvalue")
    +      .config("spark.sql.queryExecutionListeners",
    +        classOf[NormalQueryExecutionListener].getName + " ,"
    +          + classOf[AnotherQueryExecutionListener].getName)
    +      .getOrCreate()
    +    assert(SparkSession.getDefaultSession.isDefined)
    +    assert(NormalQueryExecutionListener.successCount === 0)
    +    assert(NormalQueryExecutionListener.failureCount === 0)
    +    assert(AnotherQueryExecutionListener.successCount === 0)
    +    assert(AnotherQueryExecutionListener.failureCount === 0)
    +    sparkSession.listenerManager.onSuccess("test1", mock[QueryExecution], 0)
    +    assert(NormalQueryExecutionListener.successCount === 1)
    +    assert(NormalQueryExecutionListener.failureCount === 0)
    +    assert(AnotherQueryExecutionListener.successCount === 1)
    +    assert(AnotherQueryExecutionListener.failureCount === 0)
    +    sparkSession.listenerManager.onFailure("test2", mock[QueryExecution], new Exception)
    +    assert(NormalQueryExecutionListener.successCount === 1)
    +    assert(NormalQueryExecutionListener.failureCount === 1)
    +    assert(AnotherQueryExecutionListener.successCount === 1)
    +    assert(AnotherQueryExecutionListener.failureCount === 1)
    +  }
    +}
    +
    +class NoZeroArgConstructorListener(myString: String) extends QueryExecutionListener {
    +
    +  override def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      options: Option[OutputParams]
    +  ): Unit = {}
    +
    +  override def onFailure(
    +      funcName: String,
    +      qe: QueryExecution,
    +      exception: Exception,
    +      options: Option[OutputParams]
    +  ): Unit = {}
    +}
    +
    +class NormalQueryExecutionListener extends QueryExecutionListener {
    +
    +  override def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      options: Option[OutputParams]
    +  ): Unit = { NormalQueryExecutionListener.successCount += 1 }
    +
    +  override def onFailure(
    +      funcName: String,
    +      qe: QueryExecution,
    +      exception: Exception,
    +      options: Option[OutputParams]
    +  ): Unit = { NormalQueryExecutionListener.failureCount += 1 }
    +}
    +
    +object NormalQueryExecutionListener {
    +  var successCount = 0;
    +  var failureCount = 0;
    --- End diff --
    
    Nit: no need to use `;` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    @yhuai @marmbrus @liancheng if none of you are going to take look, I'll give the code another pass and not wait for your feedback before pushing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100236345
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    > e.g. calling the save method adds a "path" key to the option map, but is that key name a public API?
    
    yes, it is. e.g. `df.write.format("parquet").option("path", some_path).save()`, the `path` is a "magic key" and we've exposed it to users, so `path` is a public API and if we change it, we will break existing applications.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    That's probably because you are not familiar with the SQL component. The existing API already has references to the QueryExecution object, which actually includes all of the information your compatibility-breaking API is currently exposing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99518378
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1047,4 +1047,14 @@ object StaticSQLConf {
           "SQL configuration and the current database.")
         .booleanConf
         .createWithDefault(false)
    +
    +  val QUERY_EXECUTION_LISTENERS = buildConf("spark.sql.queryExecutionListeners")
    +    .doc("A comma-separated list of classes that implement QueryExecutionListener. When creating " +
    +      "a SparkSession, instances of these listeners will be added to it. These classes " +
    +      "needs to have a zero-argument constructor. If the specified class can't be found or" +
    +      " the class specified doesn't have a valid constructor the SparkSession creation " +
    --- End diff --
    
    Nit: please move the starting space to the end of the last line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    @vanzin yes, InsertXXX command will carry all the write options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100564925
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    Yes those are public APIs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #72177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72177/testReport)** for PR 16664 at commit [`b0392ed`](https://github.com/apache/spark/commit/b0392ed5a8ebda9ace514b10bc383b9ffa47ac4c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99520106
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1047,4 +1047,14 @@ object StaticSQLConf {
           "SQL configuration and the current database.")
         .booleanConf
         .createWithDefault(false)
    +
    +  val QUERY_EXECUTION_LISTENERS = buildConf("spark.sql.queryExecutionListeners")
    +    .doc("A comma-separated list of classes that implement QueryExecutionListener. When creating " +
    +      "a SparkSession, instances of these listeners will be added to it. These classes " +
    +      "needs to have a zero-argument constructor. If the specified class can't be found or" +
    +      " the class specified doesn't have a valid constructor the SparkSession creation " +
    +      "will fail with an exception.")
    +    .stringConf
    +    .toSequence
    +    .createWithDefault(Nil)
    --- End diff --
    
    Not sure whether we should make it internal or external. Let the others decide it. Either is fine to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99062185
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -514,6 +576,9 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
        * shorten names(none, `snappy`, `gzip`, and `lzo`). This will override
        * `spark.sql.parquet.compression.codec`.</li>
        * </ul>
    +   * Calls the callback methods in @see[[QueryExecutionListener]] methods after query execution with
    +   * @see[[OutputParams]] having datasourceType set as string constant "parquet" and
    +   * destination set as the path to which the data is written
    --- End diff --
    
    I think we do not need to add these comments to all the functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99060951
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -190,6 +192,32 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       }
     
       /**
    +   * Executes the query and calls the {@link org.apache.spark.sql.util.QueryExecutionListener}
    +   * methods.
    +   *
    +   * @param funcName A identifier for the method executing the query
    +   * @param qe the @see [[QueryExecution]] object associated with the query
    --- End diff --
    
    Could you please fix the doc by following what https://github.com/apache/spark/pull/16013 did? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #72178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72178/testReport)** for PR 16664 at commit [`752125a`](https://github.com/apache/spark/commit/752125a10253ca15e260f317868ef7aacd3c510e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72329/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99062729
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -428,8 +481,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           partitionColumnNames = partitioningColumns.getOrElse(Nil),
           bucketSpec = getBucketSpec
         )
    -    df.sparkSession.sessionState.executePlan(
    -      CreateTable(tableDesc, mode, Some(df.logicalPlan))).toRdd
    +    val qe = df.sparkSession.sessionState.executePlan(
    +      CreateTable(tableDesc, mode, Some(df.logicalPlan)))
    +    executeAndCallQEListener(
    +      "saveAsTable",
    +      qe,
    +      new OutputParams(source, Some(tableIdent.unquotedString), extraOptions.toMap)) {
    +      qe.toRdd
    --- End diff --
    
    No need to call `new` here. Please follow the above example. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99521880
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SparkSQLQueryExecutionListenerSuite.scala ---
    @@ -0,0 +1,149 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterEach
    +import org.scalatest.mock.MockitoSugar
    +
    +import org.apache.spark.{SparkContext, SparkFunSuite}
    +import org.apache.spark.sql.execution.QueryExecution
    +import org.apache.spark.sql.util.{OutputParams, QueryExecutionListener}
    +
    +/**
    + * Test cases for the property 'spark.sql.queryExecutionListeners' that adds the
    + * @see `QueryExecutionListener` to a @see `SparkSession`
    + */
    +class SparkSQLQueryExecutionListenerSuite
    +    extends SparkFunSuite
    +    with MockitoSugar
    +    with BeforeAndAfterEach {
    +
    +  override def afterEach(): Unit = {
    +    SparkSession.clearActiveSession()
    +    SparkSession.clearDefaultSession()
    +    SparkContext.clearActiveContext()
    +  }
    +
    +  test("Creation of SparkContext with non-existent QueryExecutionListener class fails fast") {
    +    intercept[ClassNotFoundException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", "non.existent.QueryExecutionListener")
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    +  }
    +
    +  test("QueryExecutionListener that doesn't have a default constructor fails fast") {
    +    intercept[InstantiationException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", classOf[NoZeroArgConstructorListener].getName)
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    +  }
    +
    +  test("Normal QueryExecutionListeners gets added as listeners") {
    +    val sparkSession = SparkSession
    +      .builder()
    +      .master("local")
    +      .config("mykey", "myvalue")
    +      .config("spark.sql.queryExecutionListeners",
    +        classOf[NormalQueryExecutionListener].getName + " ,"
    +          + classOf[AnotherQueryExecutionListener].getName)
    +      .getOrCreate()
    +    assert(SparkSession.getDefaultSession.isDefined)
    +    assert(NormalQueryExecutionListener.successCount === 0)
    +    assert(NormalQueryExecutionListener.failureCount === 0)
    +    assert(AnotherQueryExecutionListener.successCount === 0)
    +    assert(AnotherQueryExecutionListener.failureCount === 0)
    +    sparkSession.listenerManager.onSuccess("test1", mock[QueryExecution], 0)
    +    assert(NormalQueryExecutionListener.successCount === 1)
    +    assert(NormalQueryExecutionListener.failureCount === 0)
    +    assert(AnotherQueryExecutionListener.successCount === 1)
    +    assert(AnotherQueryExecutionListener.failureCount === 0)
    +    sparkSession.listenerManager.onFailure("test2", mock[QueryExecution], new Exception)
    +    assert(NormalQueryExecutionListener.successCount === 1)
    +    assert(NormalQueryExecutionListener.failureCount === 1)
    +    assert(AnotherQueryExecutionListener.successCount === 1)
    +    assert(AnotherQueryExecutionListener.failureCount === 1)
    +  }
    +}
    +
    +class NoZeroArgConstructorListener(myString: String) extends QueryExecutionListener {
    +
    +  override def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      options: Option[OutputParams]
    +  ): Unit = {}
    --- End diff --
    
    Nit: -> `options: Option[OutputParams]): Unit = {}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Sorry I'm really confused, probably because I haven't kept track with this pr. But the diff doesn't match the pr description. Are we fixing a bug here or introducing a bunch of new APIs?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100182956
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    I think we need to make it more general instead of introducing a class for the write path only. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100565522
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala ---
    @@ -44,27 +44,50 @@ trait QueryExecutionListener {
        * @param qe the QueryExecution object that carries detail information like logical plan,
        *           physical plan, etc.
        * @param durationNs the execution time for this query in nanoseconds.
    -   *
    -   * @note This can be invoked by multiple different threads.
    +   * @param outputParams The output parameters in case the method is invoked as a result of a
    +   *                     write operation. In case of a read will be @see `None`
        */
       @DeveloperApi
    -  def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
    -
    +  def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      outputParams: Option[OutputParams]): Unit
       /**
        * A callback function that will be called when a query execution failed.
        *
        * @param funcName the name of the action that triggered this query.
        * @param qe the QueryExecution object that carries detail information like logical plan,
        *           physical plan, etc.
        * @param exception the exception that failed this query.
    +   * @param outputParams The output parameters in case the method is invoked as a result of a
    +   *                     write operation. In case of a read will be @see `None`
        *
        * @note This can be invoked by multiple different threads.
        */
       @DeveloperApi
    -  def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit
    +  def onFailure(
    +      funcName: String,
    +      qe: QueryExecution,
    +      exception: Exception,
    +      outputParams: Option[OutputParams]): Unit
     }
     
    -
    +/**
    + * Contains extra information useful for query analysis passed on from the methods in
    + * @see `org.apache.spark.sql.DataFrameWriter` while writing to a datasource
    + * @param datasourceType type of data source written to like csv, parquet, json, hive, jdbc etc.
    + * @param destination path or table name written to
    + * @param options the map containing the output options for the underlying datasource
    + *                specified by using the @see `org.apache.spark.sql.DataFrameWriter#option` method
    + * @param writeParams will contain any extra information that the write method wants to provide
    + */
    +@DeveloperApi
    +case class OutputParams(
    --- End diff --
    
    Sorry arguments to this class seem to have been picked pretty randomly. Can you explain more why these parameters are picked?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99064088
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -428,8 +481,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           partitionColumnNames = partitioningColumns.getOrElse(Nil),
           bucketSpec = getBucketSpec
         )
    -    df.sparkSession.sessionState.executePlan(
    -      CreateTable(tableDesc, mode, Some(df.logicalPlan))).toRdd
    +    val qe = df.sparkSession.sessionState.executePlan(
    +      CreateTable(tableDesc, mode, Some(df.logicalPlan)))
    +    executeAndCallQEListener(
    +      "saveAsTable",
    +      qe,
    +      new OutputParams(source, Some(tableIdent.unquotedString), extraOptions.toMap)) {
    --- End diff --
    
    `source`? Why not using a qualified table name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16664


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99062037
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,21 @@ object SQLConf {
           .booleanConf
           .createWithDefault(false)
     
    +
    +  val QUERY_EXECUTION_LISTENERS =
    +    ConfigBuilder("spark.sql.queryExecutionListeners")
    +      .doc("QueryExecutionListeners to be attached to the SparkSession")
    --- End diff --
    
    Can you improve this line? Add what you wrote in the `sql-programming-guide.md`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100366124
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    > Actually all the "magic keys" in the options used by DataFrameWriter are public APIs
    
    That's good to know, but they only seem to be, at best, indirectly documented. The `DataFrameWriter` API doesn't say anything about the keys used by any of the methods, and `sql-programming-guide.md` only touches on a handful of them; for example, none of the JDBC keys are documented.
    
    > If you want to introduce an external public interface, we need a careful design. This should be done in a separate PR.
    
    I agree that it needs a careful design and the current one doesn't cover all the options. But this PR is of very marginal value without this information being exposed in some way. If you guys feel strongly that it should be a map and that's it, I guess it will be hard to argue. Then we'll have to do that and document all the keys used internally by Spark and make them public, and promise ourselves that we'll never break them.
    
    My belief is that a more structured type would help here. Since the current code is obviously not enough, we could have something that's more future-proof, like:
    
    ```
    // Generic, just exposes the raw options, no stability guarantee past what SQL API provides.
    class QueryExecutionParams(options: Map[])
    
    // For FS-based sources
    class FsOutputParams(dataSourceType: String, path: String, options: Map[]) extends QueryExecutionParams
    
    // For JDBC
    class JdbcOutputParams(table: String, url: String, options: Map[]) extends QueryExecutionParams
    
    // Add others that are interesting.
    ```
    
    Then listeners can easily handle future params by matching and handling the generic params.
    
    Anyway, my opinion is that a raw map is not a very good API, regardless of API stability; it's hard to use and easy to break. But I'll defer to you guys if you really don't like my suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99063701
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,21 @@ object SQLConf {
           .booleanConf
           .createWithDefault(false)
     
    +
    +  val QUERY_EXECUTION_LISTENERS =
    --- End diff --
    
    I think we can put it into StaticSQLConf



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99521679
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SparkSQLQueryExecutionListenerSuite.scala ---
    @@ -0,0 +1,149 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql
    +
    +import org.scalatest.BeforeAndAfterEach
    +import org.scalatest.mock.MockitoSugar
    +
    +import org.apache.spark.{SparkContext, SparkFunSuite}
    +import org.apache.spark.sql.execution.QueryExecution
    +import org.apache.spark.sql.util.{OutputParams, QueryExecutionListener}
    +
    +/**
    + * Test cases for the property 'spark.sql.queryExecutionListeners' that adds the
    + * @see `QueryExecutionListener` to a @see `SparkSession`
    + */
    +class SparkSQLQueryExecutionListenerSuite
    +    extends SparkFunSuite
    +    with MockitoSugar
    +    with BeforeAndAfterEach {
    +
    +  override def afterEach(): Unit = {
    +    SparkSession.clearActiveSession()
    +    SparkSession.clearDefaultSession()
    +    SparkContext.clearActiveContext()
    +  }
    +
    +  test("Creation of SparkContext with non-existent QueryExecutionListener class fails fast") {
    +    intercept[ClassNotFoundException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", "non.existent.QueryExecutionListener")
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    +  }
    +
    +  test("QueryExecutionListener that doesn't have a default constructor fails fast") {
    +    intercept[InstantiationException] {
    +      SparkSession
    +        .builder()
    +        .master("local")
    +        .config("spark.sql.queryExecutionListeners", classOf[NoZeroArgConstructorListener].getName)
    +        .getOrCreate()
    +    }
    +    assert(!SparkSession.getDefaultSession.isDefined)
    --- End diff --
    
    `assert(SparkSession.getDefaultSession.isEmpty)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    I think that's a separate "bug" we should fix, i.e. DataFrameWriter should use InsertIntoDataSourceCommand so we can consolidate the two paths.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99453218
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -428,8 +465,10 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           partitionColumnNames = partitioningColumns.getOrElse(Nil),
           bucketSpec = getBucketSpec
         )
    -    df.sparkSession.sessionState.executePlan(
    -      CreateTable(tableDesc, mode, Some(df.logicalPlan))).toRdd
    +    val qe = df.sparkSession.sessionState.executePlan(
    +      CreateTable(tableDesc, mode, Some(df.logicalPlan)))
    +    val outputParams = new OutputParams(source, Some(tableIdent.unquotedString), extraOptions.toMap)
    --- End diff --
    
    Nit: no need to call `new`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99991981
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    > `outputParams` The output parameters in case the method is invoked as a result of a write operation.
    
    It sounds like `OutputParams` is designed for the write path. It is being used for description? Could we make it more general? For example, using a Map[String, String] like data structure? In the future, we can just add the tag we need?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99032305
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1302,8 +1302,9 @@ Configuration of in-memory caching can be done using the `setConf` method on `Sp
     
     ## Other Configuration Options
     
    -The following options can also be used to tune the performance of query execution. It is possible
    -that these options will be deprecated in future release as more optimizations are performed automatically.
    +The following options can also be used to tune the performance of query execution and attaching
    --- End diff --
    
    I don't think this new option belongs in this section. It has nothing to do with performance and this description now sounds weird. A separate section for it would be better, even if it's the only option there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100051268
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala ---
    @@ -44,27 +44,50 @@ trait QueryExecutionListener {
        * @param qe the QueryExecution object that carries detail information like logical plan,
        *           physical plan, etc.
        * @param durationNs the execution time for this query in nanoseconds.
    -   *
    -   * @note This can be invoked by multiple different threads.
    +   * @param outputParams The output parameters in case the method is invoked as a result of a
    +   *                     write operation. In case of a read will be @see `None`
        */
       @DeveloperApi
    -  def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
    -
    +  def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      outputParams: Option[OutputParams]): Unit
       /**
        * A callback function that will be called when a query execution failed.
        *
        * @param funcName the name of the action that triggered this query.
        * @param qe the QueryExecution object that carries detail information like logical plan,
        *           physical plan, etc.
        * @param exception the exception that failed this query.
    +   * @param outputParams The output parameters in case the method is invoked as a result of a
    +   *                     write operation. In case of a read will be @see `None`
        *
        * @note This can be invoked by multiple different threads.
        */
       @DeveloperApi
    -  def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit
    +  def onFailure(
    +      funcName: String,
    +      qe: QueryExecution,
    +      exception: Exception,
    +      outputParams: Option[OutputParams]): Unit
     }
     
    -
    +/**
    + * Contains extra information useful for query analysis passed on from the methods in
    + * @see `org.apache.spark.sql.DataFrameWriter` while writing to a datasource
    + * @param datasourceType type of data source written to like csv, parquet, json, hive, jdbc etc.
    + * @param destination path or table name written to
    + * @param options the map containing the output options for the underlying datasource
    + *                specified by using the @see `org.apache.spark.sql.DataFrameWriter#option` method
    + * @param writeParams will contain any extra information that the write method wants to provide
    + */
    +@DeveloperApi
    +case class OutputParams(
    --- End diff --
    
    It looks reasonable to provide more information to the listeners for write operations. However, this will be public, I think we should think about it more carefully to get a better design, can we do it later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Basically I see no reason to add some specific parameter to a listener API that is meant to be generic which already contains reference to QueryExecution. What are you going to do if next time you want to find some other information with a take or collect query? Do you go in and add another interface for that?
    
    If the goal is to expose information for writing data out properly, then just make it work with the existing interface and fix the issue that using DataFrameWriter doesn't call the callback (and doesn't have the correct information set in QueryExecution).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99061710
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -190,6 +192,32 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       }
     
       /**
    +   * Executes the query and calls the {@link org.apache.spark.sql.util.QueryExecutionListener}
    +   * methods.
    --- End diff --
    
    How about changing it to
    > > Wrap a DataFrameWriter action to track the QueryExecution and time cost, then report to the user-registered callback functions.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100225242
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    But not all of the keys are exposed as public APIs as far as I can see.
    
     e.g. calling the `save` method adds a "path" key to the option map, but is that key name a public API? I don't consider the key name a public API in this case (you can change it and existing applications will keep working).
    
    Similarly for the JDBC URL and table names. The public method is the API, not necessarily the keys used internally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71741/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100232628
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    In Spark SQL, for metadata-like info, we store it as a key-value map. For example, [MetadataBuilder](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala) is used for this purpose. So far, the solution proposed in this PR is not good to me. I do not think it is a good design. 
    
    Even if we add a structured type, this could be possibly changed in the future. If you want to introduce an external public interface (like our data source APIs), we need a careful design. This should be done in a separate PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99031742
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala ---
    @@ -44,27 +44,49 @@ trait QueryExecutionListener {
        * @param qe the QueryExecution object that carries detail information like logical plan,
        *           physical plan, etc.
        * @param durationNs the execution time for this query in nanoseconds.
    -   *
    -   * @note This can be invoked by multiple different threads.
    +   * @param outputParams The output parameters in case the method is invoked as a result of a
    +   *                     write operation. In case of a read will be @see[[None]]
        */
       @DeveloperApi
    -  def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
    -
    +  def onSuccess(
    +      funcName: String,
    +      qe: QueryExecution,
    +      durationNs: Long,
    +      outputParams: Option[OutputParams]): Unit
       /**
        * A callback function that will be called when a query execution failed.
        *
        * @param funcName the name of the action that triggered this query.
        * @param qe the QueryExecution object that carries detail information like logical plan,
        *           physical plan, etc.
        * @param exception the exception that failed this query.
    +   * @param outputParams The output parameters in case the method is invoked as a result of a
    +   *                     write operation. In case of a read will be @see[[None]]
        *
        * @note This can be invoked by multiple different threads.
        */
       @DeveloperApi
    -  def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit
    +  def onFailure(
    +      funcName: String,
    +      qe: QueryExecution,
    +      exception: Exception,
    +      outputParams: Option[OutputParams]): Unit
     }
     
    -
    +/**
    + * Contains extra information useful for query analysis passed on from the methods in
    + * @see[[org.apache.spark.sql.DataFrameWriter]] while writing to a datasource
    + * @param datasourceType type of data source written to like csv, parquet, json, hive, jdbc etc.
    + * @param destination path or table name written to
    + * @param options the map containing the output options for the underlying datasource
    + *                specified by using the @see [[org.apache.spark.sql.DataFrameWriter#option]] method
    + * @param writeParams will contain any extra information that the write method wants to provide
    + */
    +case class OutputParams(
    --- End diff --
    
    Add `@DeveloperApi`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #71741 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71741/testReport)** for PR 16664 at commit [`751ded0`](https://github.com/apache/spark/commit/751ded07c7f4e8ab888b3174e3310f603b79ae68).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class OutputParams(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    I think @sameeragarwal plans to review.  I glanced and it looks fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    `DataSource.write` returns `Unit`, so the entire plan will be `df.queryExecution`, which doesn't contain these write options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99062495
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +246,17 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get("dbtable")
    +      case _ => extraOptions.get("path")
    +    }
    +
    +    executeAndCallQEListener(
    +      "save",
    +      df.queryExecution,
    +      OutputParams(source, destination, extraOptions.toMap)) {
    +      dataSource.write(mode, df)
    +    }
    --- End diff --
    
    Nit: the style issue.
    ```Scala
        withAction("save", df.queryExecution, OutputParams(source, destination, extraOptions.toMap)) {
          dataSource.write(mode, df)
        }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    @rxin , in `DataFrameWriter.save` we do
    ```
        val dataSource = DataSource(
          df.sparkSession,
          className = source,
          partitionColumns = partitioningColumns.getOrElse(Nil),
          bucketSpec = getBucketSpec,
          options = extraOptions.toMap)
    
        dataSource.write(mode, df)
    ```
    knowing the entire plan is not enough, it would be better if we also have these write options(provider, partitioning, extraOptions, etc.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72417/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #72329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72329/testReport)** for PR 16664 at commit [`ecf9f34`](https://github.com/apache/spark/commit/ecf9f34addb772e9a09936420c1ad43cdd930685).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99414739
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -428,8 +481,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           partitionColumnNames = partitioningColumns.getOrElse(Nil),
           bucketSpec = getBucketSpec
         )
    -    df.sparkSession.sessionState.executePlan(
    -      CreateTable(tableDesc, mode, Some(df.logicalPlan))).toRdd
    +    val qe = df.sparkSession.sessionState.executePlan(
    +      CreateTable(tableDesc, mode, Some(df.logicalPlan)))
    +    executeAndCallQEListener(
    +      "saveAsTable",
    +      qe,
    +      new OutputParams(source, Some(tableIdent.unquotedString), extraOptions.toMap)) {
    --- End diff --
    
    source reflects the Datasource type to which the data is written. So in case of the parquet(), csv() methods it will be "parquet" and "csv". So in case of saveAsTable() should it be "hive" or "db" since qualified table name is not actually a datasource type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #72417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72417/testReport)** for PR 16664 at commit [`a0c7c22`](https://github.com/apache/spark/commit/a0c7c22e4097adbdf12e52db37c26a2246d4eddd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #72178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72178/testReport)** for PR 16664 at commit [`752125a`](https://github.com/apache/spark/commit/752125a10253ca15e260f317868ef7aacd3c510e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99518656
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -893,6 +896,12 @@ object SparkSession {
         }
       }
     
    +  private def createQueryExecutionListeners(conf: SparkConf): Seq[QueryExecutionListener] = {
    +    conf.get(StaticSQLConf.QUERY_EXECUTION_LISTENERS)
    +      .map(Utils.classForName(_))
    --- End diff --
    
    Nit: -> `.map(Utils.classForName)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Does that mean the information would show up in the plan? That would be great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    @salilsurendran yes, and we can send another PR to fix the InsertXXX command problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    /cc @yhuai @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99979749
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    --- End diff --
    
    Could you please give me some more info. Looking at the DataFrameWriter#jdbc method it sets the source as "jdbc". Are there other places that this source is being set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #72417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72417/testReport)** for PR 16664 at commit [`a0c7c22`](https://github.com/apache/spark/commit/a0c7c22e4097adbdf12e52db37c26a2246d4eddd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99032040
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala ---
    @@ -114,6 +138,55 @@ class DataFrameCallbackSuite extends QueryTest with SharedSQLContext {
         spark.listenerManager.unregister(listener)
       }
     
    +  test("QueryExecutionListener gets called on DataFrameWriter.parquet method") {
    +    callSave("parquet", (df: DataFrame, path: String) => df.write.parquet(path))
    +  }
    +
    +  test("QueryExecutionListener gets called on DataFrameWriter.json method") {
    +    callSave("json", (df: DataFrame, path: String) => df.write.json(path))
    +  }
    +
    +  test("QueryExecutionListener gets called on DataFrameWriter.csv method") {
    +    callSave("csv", (df: DataFrame, path: String) => df.write.csv(path))
    +  }
    +
    +  test("QueryExecutionListener gets called on DataFrameWriter.saveAsTable method") {
    +    var onWriteSuccessCalled = false
    +    spark.listenerManager.register(new QueryExecutionListener {
    +
    +      override def onFailure(
    +          funcName: String,
    +          qe: QueryExecution,
    +          exception: Exception,
    +          outputParams: Option[OutputParams]): Unit = {}
    +
    +      override def onSuccess(
    +          funcName: String,
    +          qe: QueryExecution,
    +          durationNs: Long,
    +          outputParams: Option[OutputParams]): Unit = {
    +        assert(durationNs > 0)
    +        assert(qe ne null)
    +        onWriteSuccessCalled = true
    +      }
    +    })
    +    withTable("bar") {
    +      Seq(1 -> 100).toDF("x", "y").write.saveAsTable("bar")
    +    }
    +    assert(onWriteSuccessCalled)
    +    spark.listenerManager.clear()
    +  }
    +
    +  private def callSave(source: String, callSaveFunction: (DataFrame, String) => Unit): Unit = {
    +    val testQueryExecutionListener = new TestQueryExecutionListener(source)
    +    spark.listenerManager.register(testQueryExecutionListener)
    +    withTempPath { path =>
    +      callSaveFunction(Seq(1 -> 100).toDF("x", "y"), path.getAbsolutePath)
    +    }
    +    assert(testQueryExecutionListener.onWriteSuccessCalled)
    +    spark.listenerManager.clear()
    --- End diff --
    
    Same here. Feels like it should be in `SharedSQLContext.afterEach`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99520045
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -893,6 +896,12 @@ object SparkSession {
         }
       }
     
    +  private def createQueryExecutionListeners(conf: SparkConf): Seq[QueryExecutionListener] = {
    +    conf.get(StaticSQLConf.QUERY_EXECUTION_LISTENERS)
    +      .map(Utils.classForName(_))
    +      .map(_.newInstance().asInstanceOf[QueryExecutionListener])
    --- End diff --
    
    Simply throwing `ClassNotFoundException` might not be good to end users, if we plan to make this SQL configuration external. 
    
    Could you use the try and catch to issue a better error message when we are unable to create/initialize the class? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99979817
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    Yes for methods like saveAsTable() there is no path. Do you see a issue here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100560364
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    ping @sameeragarwal @rxin @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99032021
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala ---
    @@ -114,6 +138,55 @@ class DataFrameCallbackSuite extends QueryTest with SharedSQLContext {
         spark.listenerManager.unregister(listener)
       }
     
    +  test("QueryExecutionListener gets called on DataFrameWriter.parquet method") {
    +    callSave("parquet", (df: DataFrame, path: String) => df.write.parquet(path))
    +  }
    +
    +  test("QueryExecutionListener gets called on DataFrameWriter.json method") {
    +    callSave("json", (df: DataFrame, path: String) => df.write.json(path))
    +  }
    +
    +  test("QueryExecutionListener gets called on DataFrameWriter.csv method") {
    +    callSave("csv", (df: DataFrame, path: String) => df.write.csv(path))
    +  }
    +
    +  test("QueryExecutionListener gets called on DataFrameWriter.saveAsTable method") {
    +    var onWriteSuccessCalled = false
    +    spark.listenerManager.register(new QueryExecutionListener {
    +
    +      override def onFailure(
    +          funcName: String,
    +          qe: QueryExecution,
    +          exception: Exception,
    +          outputParams: Option[OutputParams]): Unit = {}
    +
    +      override def onSuccess(
    +          funcName: String,
    +          qe: QueryExecution,
    +          durationNs: Long,
    +          outputParams: Option[OutputParams]): Unit = {
    +        assert(durationNs > 0)
    +        assert(qe ne null)
    +        onWriteSuccessCalled = true
    +      }
    +    })
    +    withTable("bar") {
    +      Seq(1 -> 100).toDF("x", "y").write.saveAsTable("bar")
    +    }
    +    assert(onWriteSuccessCalled)
    +    spark.listenerManager.clear()
    --- End diff --
    
    This needs to be in a finally block no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72178/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99415030
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,21 @@ object SQLConf {
           .booleanConf
           .createWithDefault(false)
     
    +
    +  val QUERY_EXECUTION_LISTENERS =
    +    ConfigBuilder("spark.sql.queryExecutionListeners")
    +      .doc("QueryExecutionListeners to be attached to the SparkSession")
    --- End diff --
    
    In this case I updated the doc to read "A comma-separated list of classes that implement QueryExecutionListener that will be attached to the SparkSession". I could attach the whole line I put in sql-programming-guide.md but it will make it look out of place compared to the docs for other properties in the same class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r97422538
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -190,6 +192,33 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       }
     
       /**
    +   * Executes the query and calls the {@link org.apache.spark.sql.util.QueryExecutionListener}
    +   * methods.
    +   *
    +   * @param funcName A identifier for the method executing the query
    +   * @param qe the @see [[QueryExecution]] object associated with the
    +   *        query
    +   * @param outputParams The output parameters useful for query analysis
    +   * @param action the function that executes the query after which the listener methods gets
    +   *               called.
    +   */
    +  private def executeAndCallQEListener(
    +                                        funcName: String,
    --- End diff --
    
    Formatting is wrong here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72177/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99518083
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    --- End diff --
    
    For JDBC, the source value might not be `jdbc`. For example, `jDbC`, `JDBC`, `org.apache.spark.sql.jdbc.DefaultSource`, `org.apache.spark.sql.jdbc`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100218129
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    So that comment is about different types of queries that might have different extra information. I still think a generic map is the wrong idea for fixing that.
    
    You could, for example, have this parameter be `Any` (or some tagging interface, e.g. `trait QueryExecutionParams` or some such), and the listener can then match to find the right type. That is extensible (new types can be added without breaking existing ones) and wouldn't require this API to change in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #72177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72177/testReport)** for PR 16664 at commit [`b0392ed`](https://github.com/apache/spark/commit/b0392ed5a8ebda9ace514b10bc383b9ffa47ac4c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Actually @cloud-fan are you sure it is a problem right now? DataSOurce.write itself creates the commands, and if the information are propagated correctly, the QueryExecution object should have a command InsertIntoHadoopFsRelationCommand.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    **[Test build #72329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72329/testReport)** for PR 16664 at commit [`ecf9f34`](https://github.com/apache/spark/commit/ecf9f34addb772e9a09936420c1ad43cdd930685).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100186876
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    See my [comment](https://github.com/apache/spark/pull/16664#issuecomment-277597939) about the other code paths that also need `QueryExecutionListener`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    why do we need the new config `spark.sql.queryExecutionListeners`? I think it's not hard to register listeners manually, and at least we should do it in a follow-up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99062659
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -261,13 +304,19 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           )
         }
     
    -    df.sparkSession.sessionState.executePlan(
    +    val qe = df.sparkSession.sessionState.executePlan(
           InsertIntoTable(
             table = UnresolvedRelation(tableIdent),
             partition = Map.empty[String, Option[String]],
             child = df.logicalPlan,
             overwrite = mode == SaveMode.Overwrite,
    -        ifNotExists = false)).toRdd
    +        ifNotExists = false))
    +    executeAndCallQEListener(
    +      "insertInto",
    +      qe,
    +      new OutputParams(source, Some(tableIdent.unquotedString), extraOptions.toMap)) {
    +        qe.toRdd
    +    }
    --- End diff --
    
    Nit: also the style issue.
    ```Scala
        val outputParms = OutputParams(source, Some(tableIdent.unquotedString), extraOptions.toMap)
        withAction("insertInto", qe, outputParms)(qe.toRdd)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Just finished this round of reviews. Thanks!
    
    This PR enables the QueryExecutionListener when users using the DataFrameWriter methods. However, it still misses the other code paths, especially, the DDL statements. For example, CTAS when using the `sql()` API. cc @sameeragarwal @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100231056
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    >  is like metadata
    
    It is metadata, but that doesn't mean it doesn't have meaning and thus doesn't need structure. Some of the metadata currently models the "where" the data is being written. Internally it doesn't really matter much how much it's handled (it's an "implementation detail"), but, for someone building an application that uses this information, knowing that a particular key means "where the data will end up" *is* very important, and a structured type with proper, documented fields helps that.
    
    We just happen to want that information, and we could use it either way, but that's beside the point. I'm arguing that there's value in exposing this data in a more structured manner than just an opaque map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99453252
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +246,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get("dbtable")
    --- End diff --
    
    Nit: `dbtable` -> `JDBCOptions.JDBC_TABLE_NAME`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    > The existing API already has references to the QueryExecution object, which actually includes all of the information your compatibility-breaking API is currently exposing.
    
    That's fair but not what I was told; if that's the case then great, but I'll let Salil comment since he's looked at this code way more than I have.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100224399
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    As we already use `Map` as the user-facing API for `DataFrameWriter`, I think we can't change it. To make it consistent, I think it's reasonable to still pass `Map` to listeners. I agree it's bad that listeners to know these magic keys, but it's already the case of `DataFrameWriter`/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99991251
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    --- End diff --
    
    For example,
    ```Scala
        df.write.format("org.apache.spark.sql.jdbc")
        .options(Map("url" -> url, "dbtable" -> "TEST.SAVETEST"))
        .save()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100127337
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    > Could we make it more general? For example, using a Map[String, String]
    
    Being the person who requested this class instead of an opaque map, I think using an opaque map makes for a really bad user API. The listener now needs to know about "magic keys" that have special meaning, which can vary depending on the destination. So you end up making up some contract that certain keys have some special meanings an all sources need to use them that way, so basically you end up encoding this class in a map.
    
    That being said I'm not super happy with the way JDBC works, because there's still some information embedded in the map. I thought about it a little but didn't come up with a good solution; embedding the table name in the JDBC URI sounds hacky and brittle. Best one I got is a separate field in this class (e.g. `serverUri`) that can be used to identify the server that is hosting the `destination` value (not needed for FS-based destinations since it's in the URI, but could be useful in other cases - maybe other table-based systems like Kudu or HBase).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100049953
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -190,6 +193,32 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       }
     
       /**
    +   * Wrap a DataFrameWriter action to track the query execution and time cost, then report to the
    +   * user-registered callback functions.
    +   *
    +   * @param funcName A identifier for the method executing the query
    +   * @param qe the @see `QueryExecution` object associated with the query
    +   * @param outputParams The output parameters useful for query analysis
    +   * @param action the function that executes the query after which the listener methods gets
    +   *               called.
    +   */
    +  private def withAction(
    +      funcName: String,
    +      qe: QueryExecution,
    +      outputParams: OutputParams)(action: => Unit) = {
    +    try {
    +      val start = System.nanoTime()
    --- End diff --
    
    `Dataset.withAction` will reset metrics of physical plans, shall we do it here? And can we create a general function for both `Dataset` and `DataFrameWriter`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    > Actually we are not only introducing new APIs, we are also breaking old APIs in this patch. Please separate the bug fix part from the API changing part.
    
    I actually disagree that this particular change should be a separate PR. Part of exposing these new queries to the listener is providing information of what these queries are doing, and the current (developer) API does not have a way to expose that.
    
    We can discuss ways of maybe exposing this information in a way that doesn't break the existing API (I thought about a couple but I didn't like any of them, so my preference was to just modify the existing developer) API. But I strongly feel the bug fix is not complete without this information being exposed in some way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99415847
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,21 @@ object SQLConf {
           .booleanConf
           .createWithDefault(false)
     
    +
    +  val QUERY_EXECUTION_LISTENERS =
    +    ConfigBuilder("spark.sql.queryExecutionListeners")
    +      .doc("QueryExecutionListeners to be attached to the SparkSession")
    --- End diff --
    
    We do not have a separate document for the Spark SQL configuration. We expect users to do it using the command `set -v`. This command will output the contents of `doc`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100184896
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    What other parameters are you thinking about?
    
    It's pretty easy to wrap this class in some other class that has these output params and any other params you want to expose, but it would be nice to understand exactly what you're referring to here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    When I was working on this PR the output path wasn't there but if you are confident that it is there then it might have been added recently. I can check and get back to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r97422571
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -190,6 +192,33 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       }
     
       /**
    +   * Executes the query and calls the {@link org.apache.spark.sql.util.QueryExecutionListener}
    +   * methods.
    +   *
    +   * @param funcName A identifier for the method executing the query
    +   * @param qe the @see [[QueryExecution]] object associated with the
    +   *        query
    --- End diff --
    
    Fits in the previous line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by salilsurendran <gi...@git.apache.org>.

Github user salilsurendran commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    The QueryExecution object doesn't have details related to the output metadata. Like for eg. if I call df.write.parquet("/my/path"). The path to which the DataFrame is written i.e. "/my/path" is not available in the QueryExecution object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100229660
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    Based on my understanding, the extra information we pass to QueryExecutionListener is like metadata. It is just for helping users understand the context. I still do not understand why we need to define a class/trait for it. This extra class/trait looks weird for this goal, unless you have some applications that are built on this class/trait.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100236493
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    Actually all the "magic keys" in the options used by `DataFrameWriter` are public APIs, they are not going to change and users need to know about them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99416910
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -428,8 +481,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           partitionColumnNames = partitioningColumns.getOrElse(Nil),
           bucketSpec = getBucketSpec
         )
    -    df.sparkSession.sessionState.executePlan(
    -      CreateTable(tableDesc, mode, Some(df.logicalPlan))).toRdd
    +    val qe = df.sparkSession.sessionState.executePlan(
    +      CreateTable(tableDesc, mode, Some(df.logicalPlan)))
    +    executeAndCallQEListener(
    +      "saveAsTable",
    +      qe,
    +      new OutputParams(source, Some(tableIdent.unquotedString), extraOptions.toMap)) {
    --- End diff --
    
    I got your points. `source` looks ok to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99518130
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    For the external data source connectors, it might not have `path`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99415540
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -190,6 +192,32 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       }
     
       /**
    +   * Executes the query and calls the {@link org.apache.spark.sql.util.QueryExecutionListener}
    +   * methods.
    +   *
    +   * @param funcName A identifier for the method executing the query
    +   * @param qe the @see [[QueryExecution]] object associated with the query
    +   * @param outputParams The output parameters useful for query analysis
    +   * @param action the function that executes the query after which the listener methods gets
    +   *               called.
    +   */
    +  private def executeAndCallQEListener(
    --- End diff --
    
    Yes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99518239
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1047,4 +1047,14 @@ object StaticSQLConf {
           "SQL configuration and the current database.")
         .booleanConf
         .createWithDefault(false)
    +
    +  val QUERY_EXECUTION_LISTENERS = buildConf("spark.sql.queryExecutionListeners")
    +    .doc("A comma-separated list of classes that implement QueryExecutionListener. When creating " +
    +      "a SparkSession, instances of these listeners will be added to it. These classes " +
    +      "needs to have a zero-argument constructor. If the specified class can't be found or" +
    --- End diff --
    
    Nit: `needs` -> `need`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    Well it does. It contains the entire plan.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    I just quickly went over the code. It looks ok to me, but I will review it again when the comments are resolved. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99063668
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,21 @@ object SQLConf {
           .booleanConf
           .createWithDefault(false)
     
    +
    +  val QUERY_EXECUTION_LISTENERS =
    +    ConfigBuilder("spark.sql.queryExecutionListeners")
    +      .doc("QueryExecutionListeners to be attached to the SparkSession")
    --- End diff --
    
    I think we can put it into `StaticSQLConf`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r99518295
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1047,4 +1047,14 @@ object StaticSQLConf {
           "SQL configuration and the current database.")
         .booleanConf
         .createWithDefault(false)
    +
    +  val QUERY_EXECUTION_LISTENERS = buildConf("spark.sql.queryExecutionListeners")
    +    .doc("A comma-separated list of classes that implement QueryExecutionListener. When creating " +
    +      "a SparkSession, instances of these listeners will be added to it. These classes " +
    +      "needs to have a zero-argument constructor. If the specified class can't be found or" +
    +      " the class specified doesn't have a valid constructor the SparkSession creation " +
    --- End diff --
    
    `the class specified` -> `the specified class`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16664
  
    I think it's ok to enable the listener for `DataFrameWriter` first, we can think about DDL commands later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org