You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2015/04/22 22:36:35 UTC

[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/5638

    [SPARK-7059][SQL] Create a DataFrame join API to facilitate equijoin.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark joinUsing

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5638
    
----
commit b1bd9148348ac33671ddf7f7aaf776374dfdf951
Author: Reynold Xin <rx...@databricks.com>
Date:   2015-04-22T20:35:50Z

    [SPARK-7059][SQL] Create a DataFrame join API to facilitate equijoin and self join.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5638#issuecomment-95355052
  
      [Test build #30786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30786/consoleFull) for   PR 5638 at commit [`b1bd914`](https://github.com/apache/spark/commit/b1bd9148348ac33671ddf7f7aaf776374dfdf951).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5638#issuecomment-95361075
  
      [Test build #30787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30787/consoleFull) for   PR 5638 at commit [`13e9cc9`](https://github.com/apache/spark/commit/13e9cc92e837b9c874d2a90bec3e80bea0f66470).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5638#discussion_r28912577
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -343,6 +343,35 @@ class DataFrame private[sql](
       }
     
       /**
    +   * Inner equi-join with another [[DataFrame]] using the column.
    --- End diff --
    
    nit: the _given_ column. ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5638#issuecomment-95361101
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30787/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5638#discussion_r28912783
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -343,6 +343,35 @@ class DataFrame private[sql](
       }
     
       /**
    +   * Inner equi-join with another [[DataFrame]] using the column.
    +   * {{{
    +   *   // Joining df1 and df2 using the column "user_id"
    +   *   df1.join(df2, "user_id")
    +   * }}}
    +   *
    --- End diff --
    
    I would also state that similar to SQL USING the join column will only appear once in the output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5638


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5638#issuecomment-95330382
  
    Addressed the comments and added Python too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5638#issuecomment-95355067
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30786/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5638#discussion_r28912726
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -343,6 +343,35 @@ class DataFrame private[sql](
       }
     
       /**
    +   * Inner equi-join with another [[DataFrame]] using the column.
    +   * {{{
    +   *   // Joining df1 and df2 using the column "user_id"
    +   *   df1.join(df2, "user_id")
    +   * }}}
    +   *
    +   * Note that if you perform a self-join using this function, you wouldn't be able to reference
    --- End diff --
    
    ... if you perform a self-join _without aliasing the input DataFrames_ ...
    nit: you _will not`_ be able ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5638#issuecomment-95332170
  
      [Test build #30787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30787/consoleFull) for   PR 5638 at commit [`13e9cc9`](https://github.com/apache/spark/commit/13e9cc92e837b9c874d2a90bec3e80bea0f66470).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7059][SQL] Create a DataFrame join API ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5638#issuecomment-95327977
  
      [Test build #30786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30786/consoleFull) for   PR 5638 at commit [`b1bd914`](https://github.com/apache/spark/commit/b1bd9148348ac33671ddf7f7aaf776374dfdf951).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org