You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2016/05/28 04:37:00 UTC

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/13370

    [SPARK-15638][SQL] Audit Dataset, SparkSession, and SQLContext

    ## What changes were proposed in this pull request?
    This patch contains a list of changes as a result of my auditing Dataset, SparkSession, and SQLContext. The patch audits the categorization of experimental APIs, function groups, and deprecations. For the detailed list of changes, please see the diff.
    
    ## How was this patch tested?
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark SPARK-15638

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13370.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13370
    
----
commit 36b804bba48761dd7e82f2c14c61b302580e86d4
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-05-28T04:35:01Z

    [SPARK-15638][SQL] Audit Dataset, SparkSession, and SQLContext functions and documentations

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r65125196
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
    @@ -204,8 +205,8 @@ class KeyValueGroupedDataset[K, V] private[sql](
        * Internal helper function for building typed aggregations that return tuples.  For simplicity
        * and code reuse, we do this without the help of the type system and then use helper functions
        * that cast appropriately for the user facing interface.
    -   * TODO: does not handle aggregations that return nonflat results,
        */
    +  // TODO: does not handle aggregations that return nonflat results.
    --- End diff --
    
    I'm pretty sure this TODO is already done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r64983961
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala ---
    @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.TableIdentifier
     import org.apache.spark.sql.internal.SQLConf
     import org.apache.spark.sql.types.{BooleanType, StringType, StructField, StructType}
     
    +@deprecated("This suite is deprecated to silent compiler deprecation warnings", "2.0.0")
    --- End diff --
    
    removes a bunch of compile time warnings


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13370#issuecomment-222293473
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59548/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r65125096
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
    @@ -132,6 +130,15 @@ class Column(protected[sql] val expr: Expression) extends Logging {
         case _ => UnresolvedAttribute.quotedString(name)
       })
     
    +  override def toString: String = usePrettyExpression(expr).sql
    +
    +  override def equals(that: Any): Boolean = that match {
    +    case that: Column => that.expr.equals(this.expr)
    --- End diff --
    
    how about `that.expr.semanticEquals(this.expr)`? One column equals to another if they always produce same result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r64983952
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -52,7 +52,7 @@ case class CatalogStorageFormat(
     
     object CatalogStorageFormat {
       /** Empty storage format for default values and copies. */
    -  val EmptyStorageFormat = CatalogStorageFormat(locationUri = None, inputFormat = None,
    +  val empty = CatalogStorageFormat(locationUri = None, inputFormat = None,
    --- End diff --
    
    cc @andrewor14 @viirya 
    
    from 
    
    https://github.com/apache/spark/pull/13327#discussion_r64943992



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/13370#issuecomment-222597634
  
    Thanks - merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r64983958
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileCatalogSuite.scala ---
    @@ -19,6 +19,8 @@ package org.apache.spark.sql.execution.datasources
     
     import java.io.File
     
    +import scala.language.reflectiveCalls
    --- End diff --
    
    removes a bunch of compile time warnings


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13370#issuecomment-222289732
  
    **[Test build #59548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59548/consoleFull)** for PR 13370 at commit [`36b804b`](https://github.com/apache/spark/commit/36b804bba48761dd7e82f2c14c61b302580e86d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/13370#issuecomment-222595668
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13370#issuecomment-222293443
  
    **[Test build #59548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59548/consoleFull)** for PR 13370 at commit [`36b804b`](https://github.com/apache/spark/commit/36b804bba48761dd7e82f2c14c61b302580e86d4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13370


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r64984155
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -41,15 +41,13 @@ import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project}
      *                 at the end of current Spark session. Existing permanent relations with the same
      *                 name are not visible to the current session while the temporary view exists,
      *                 unless they are specified with full qualified table name with database prefix.
    - * @param sql the original sql
      */
     case class CreateViewCommand(
         tableDesc: CatalogTable,
         child: LogicalPlan,
         allowExisting: Boolean,
         replace: Boolean,
    -    isTemporary: Boolean,
    -    sql: String)
    --- End diff --
    
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13370#issuecomment-222293472
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r64983960
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala ---
    @@ -37,9 +35,9 @@ class HiveDDLCommandSuite extends PlanTest {
     
       private def extractTableDesc(sql: String): (CatalogTable, Boolean) = {
         parser.parsePlan(sql).collect {
    -      case CreateTableCommand(desc, allowExisting) => (desc, allowExisting)
    -      case CreateTableAsSelectLogicalPlan(desc, _, allowExisting) => (desc, allowExisting)
    -      case CreateViewCommand(desc, _, allowExisting, _, _, _) => (desc, allowExisting)
    +      case c: CreateTableCommand => (c.table, c.ifNotExists)
    --- End diff --
    
    related to https://github.com/apache/spark/pull/13327#discussion_r64944224


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r65126050
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
    @@ -204,8 +205,8 @@ class KeyValueGroupedDataset[K, V] private[sql](
        * Internal helper function for building typed aggregations that return tuples.  For simplicity
        * and code reuse, we do this without the help of the type system and then use helper functions
        * that cast appropriately for the user facing interface.
    -   * TODO: does not handle aggregations that return nonflat results,
        */
    +  // TODO: does not handle aggregations that return nonflat results.
    --- End diff --
    
    cool i will remove it



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r64983965
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -41,15 +41,13 @@ import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project}
      *                 at the end of current Spark session. Existing permanent relations with the same
      *                 name are not visible to the current session while the temporary view exists,
      *                 unless they are specified with full qualified table name with database prefix.
    - * @param sql the original sql
      */
     case class CreateViewCommand(
         tableDesc: CatalogTable,
         child: LogicalPlan,
         allowExisting: Boolean,
         replace: Boolean,
    -    isTemporary: Boolean,
    -    sql: String)
    --- End diff --
    
    related to https://github.com/apache/spark/pull/13327#discussion_r64944224
    
    cc @andrewor14 @viirya 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13370#discussion_r65125156
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
    @@ -132,6 +130,15 @@ class Column(protected[sql] val expr: Expression) extends Logging {
         case _ => UnresolvedAttribute.quotedString(name)
       })
     
    +  override def toString: String = usePrettyExpression(expr).sql
    +
    +  override def equals(that: Any): Boolean = that match {
    +    case that: Column => that.expr.equals(this.expr)
    --- End diff --
    
    oh sorry you just move them up, it's fine to keep them same as before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org