You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2015/02/03 10:09:37 UTC

[GitHub] spark pull request: [SQL] DataFrame API update

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/4332

    [SQL] DataFrame API update

    1. Added Java-friendly version of the expression operators (i.e. gt, geq)
    2. Added JavaDoc for most operators
    3. Simplified expression operators by having only one version of the function (that accepts Any). Previously we had two methods for each expression operator, one accepting Any and another accepting Column.
    4. agg function now accepts varargs of (String, String).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark df-update

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4332.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4332
    
----
commit 576d07a2a6dddef77693e5f1caca1df30fd8f2e4
Author: Reynold Xin <rx...@databricks.com>
Date:   2015-02-03T07:48:37Z

    random commit.
    
    Conflicts:
    	sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala

commit ab0aa69d2df6ba40359953e32883505ddc309e4f
Author: Reynold Xin <rx...@databricks.com>
Date:   2015-02-03T09:04:30Z

    Added Java friendly expression methods.
    Added JavaDoc.
    For each expression operator, have only one version of the function (that accepts Any). Previously we had two methods for each expression operator, one accepting Any and another accepting Column.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] DataFrame API update

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4332#issuecomment-72617077
  
      [Test build #26644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26644/consoleFull) for   PR 4332 at commit [`ab0aa69`](https://github.com/apache/spark/commit/ab0aa69d2df6ba40359953e32883505ddc309e4f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] DataFrame API update

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4332#issuecomment-72626858
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26644/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] DataFrame API update

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4332


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] DataFrame API update

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4332#discussion_r23991126
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dsl.scala ---
    @@ -100,27 +91,82 @@ object Dsl {
         Column(literalExpr)
       }
     
    +  //////////////////////////////////////////////////////////////////////////////////////////////
    +  //////////////////////////////////////////////////////////////////////////////////////////////
    +
    +  /** Aggregate function: returns the sum of all values in the expression. */
       def sum(e: Column): Column = Sum(e.expr)
    +
    +  /** Aggregate function: returns the sum of distinct values in the expression. */
       def sumDistinct(e: Column): Column = SumDistinct(e.expr)
    +
    +  /** Aggregate function: returns the number of items in a group. */
       def count(e: Column): Column = Count(e.expr)
     
    +  /** Aggregate function: returns the number of distinct items in a group. */
       @scala.annotation.varargs
       def countDistinct(expr: Column, exprs: Column*): Column =
         CountDistinct((expr +: exprs).map(_.expr))
     
    +  /** Aggregate function: returns the approximate number of distinct items in a group. */
       def approxCountDistinct(e: Column): Column = ApproxCountDistinct(e.expr)
    -  def approxCountDistinct(e: Column, rsd: Double): Column =
    -    ApproxCountDistinct(e.expr, rsd)
     
    +  /** Aggregate function: returns the approximate number of distinct items in a group. */
    +  def approxCountDistinct(e: Column, rsd: Double): Column = ApproxCountDistinct(e.expr, rsd)
    +
    +  /** Aggregate function: returns the average of the values in a group. */
       def avg(e: Column): Column = Average(e.expr)
    +
    +  /** Aggregate function: returns the first value in a group. */
       def first(e: Column): Column = First(e.expr)
    +
    +  /** Aggregate function: returns the last value in a group. */
       def last(e: Column): Column = Last(e.expr)
    +
    +  /** Aggregate function: returns the minimum value of the expression in a group. */
       def min(e: Column): Column = Min(e.expr)
    +
    +  /** Aggregate function: returns the maximum value of the expression in a group. */
       def max(e: Column): Column = Max(e.expr)
     
    +  //////////////////////////////////////////////////////////////////////////////////////////////
    +  //////////////////////////////////////////////////////////////////////////////////////////////
    +
    +  /**
    +   * Unary minus, i.e. negate the expression.
    +   * {{{
    +   *   // Select the amount column and negates all values.
    +   *   // Scala:
    +   *   df.select( -df("amount") )
    +   *
    +   *   // Java:
    +   *   df.select( negate(df.col("amount")) );
    +   * }}}
    +   */
    +  def negate(e: Column): Column = -e
    --- End diff --
    
    negative or neg?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] DataFrame API update

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/4332#issuecomment-72617197
  
    cc @davies Number 3 is a big change - but it does simplify the API documentation quite a bit.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] DataFrame API update

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4332#issuecomment-72626851
  
      [Test build #26644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26644/consoleFull) for   PR 4332 at commit [`ab0aa69`](https://github.com/apache/spark/commit/ab0aa69d2df6ba40359953e32883505ddc309e4f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org