You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mfliu <gi...@git.apache.org> on 2015/10/08 20:33:09 UTC

[GitHub] spark pull request: SparkR joins. Used DataFrame.R and test_sparkS...

GitHub user mfliu opened a pull request:

    https://github.com/apache/spark/pull/9029

    SparkR joins. Used DataFrame.R and test_sparkSQL.R from Spark 1.5.1

    I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file.
    Pull request because I filed this JIRA bug report:
    https://issues.apache.org/jira/browse/SPARK-10981

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mfliu/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9029
    
----
commit 5bde8cd7498912cecd8b2f169b60ab7cc86c7436
Author: Monica Liu <li...@gmail.com>
Date:   2015-10-08T18:23:43Z

    SparkR joins. Used DataFrame.R from Spark 1.5.1 because of changes to collect() and orderBy() functions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by mfliu <gi...@git.apache.org>.
Github user mfliu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9029#discussion_r41663399
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1414,9 +1414,10 @@ setMethod("where",
     #' @param x A Spark DataFrame
     #' @param y A Spark DataFrame
     #' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
    -#' Column expression. If joinExpr is omitted, join() wil perform a Cartesian join
    +#' Column expression. If joinExpr is omitted, join() will perform a Cartesian join
     #' @param joinType The type of join to perform. The following join types are available:
    -#' 'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'. The default joinType is "inner".
    +#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left',
    --- End diff --
    
    Would it make sense to add "right_outer" and "left_outer" along with "rightouter" and "leftouter"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147830747
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146655649
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43416/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SparkR joins. Used DataFrame.R and test_sparkS...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146649514
  
    Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146872420
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147844123
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SparkR joins. Used DataFrame.R and test_sparkS...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146648243
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146655645
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146874133
  
      [Test build #43468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43468/console) for   PR 9029 at commit [`d4a1ed3`](https://github.com/apache/spark/commit/d4a1ed391795881d2fd82cba3c01aaa26e3ace20).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146895645
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146874146
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43468/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146651254
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147850221
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9029#discussion_r41691353
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1414,9 +1414,10 @@ setMethod("where",
     #' @param x A Spark DataFrame
     #' @param y A Spark DataFrame
     #' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
    -#' Column expression. If joinExpr is omitted, join() wil perform a Cartesian join
    +#' Column expression. If joinExpr is omitted, join() will perform a Cartesian join
     #' @param joinType The type of join to perform. The following join types are available:
    -#' 'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'. The default joinType is "inner".
    +#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left',
    --- End diff --
    
    yeah, API compatibility is a concern. So we can make R code consistent with the scala version at https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala#L21
    
    That is, replace the "_" char in the join type string with empty.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146884080
  
      [Test build #43469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43469/console) for   PR 9029 at commit [`efa072c`](https://github.com/apache/spark/commit/efa072caaaf9c4805e63da22a9342504a292c08c).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147833846
  
      [Test build #43666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43666/console) for   PR 9029 at commit [`a67965a`](https://github.com/apache/spark/commit/a67965ae9322303511a2b1f2fb3a2d2be043fb85).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147850255
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146897203
  
      [Test build #43473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43473/consoleFull) for   PR 9029 at commit [`216be37`](https://github.com/apache/spark/commit/216be3780d909ee6ac8d217cf34cb7dec073793e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146681852
  
    @mfliu I think thats just a flaky test, but irrespective of that your changes should be against the current `master` branch. Right now it looks like there are a lot more lines in the diff because of the change being against 1.5.1 ? For fixing this in branch-1.5, we will first merge with master and then during merge we can backport it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147844952
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by mfliu <gi...@git.apache.org>.
Github user mfliu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9029#discussion_r41649096
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1414,9 +1414,10 @@ setMethod("where",
     #' @param x A Spark DataFrame
     #' @param y A Spark DataFrame
     #' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
    -#' Column expression. If joinExpr is omitted, join() wil perform a Cartesian join
    +#' Column expression. If joinExpr is omitted, join() will perform a Cartesian join
     #' @param joinType The type of join to perform. The following join types are available:
    -#' 'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'. The default joinType is "inner".
    +#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left',
    --- End diff --
    
    That change was made based on the comment on the JIRA report:
    https://issues.apache.org/jira/browse/SPARK-10981
    
    In the PR, please:
    1. Support all join types defined in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala (You can remove the "_" char from the currently supported join types in SparkR)
    2. Add test cases for missing join types including "leftsemi"
    
    Perhaps I misunderstood?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147829272
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43664/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147828913
  
      [Test build #43664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43664/consoleFull) for   PR 9029 at commit [`9603722`](https://github.com/apache/spark/commit/96037228048a290cfc07bde9a991bec373d36d1b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by mfliu <gi...@git.apache.org>.
Github user mfliu commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146882602
  
    Sorry, had a typo in one of my unit tests. It now passes the run-tests.sh on my machine. Can you test again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9029#discussion_r41662805
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1414,9 +1414,10 @@ setMethod("where",
     #' @param x A Spark DataFrame
     #' @param y A Spark DataFrame
     #' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
    -#' Column expression. If joinExpr is omitted, join() wil perform a Cartesian join
    +#' Column expression. If joinExpr is omitted, join() will perform a Cartesian join
     #' @param joinType The type of join to perform. The following join types are available:
    -#' 'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'. The default joinType is "inner".
    +#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left',
    --- End diff --
    
    +1 breaking API changes, IMO we really need to come up on some policy on that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147833867
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147909733
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147850942
  
      [Test build #43673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43673/consoleFull) for   PR 9029 at commit [`8813b1c`](https://github.com/apache/spark/commit/8813b1cba51e10c4e3e6e3a7b27b132ca9d8ac51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147855661
  
      [Test build #43673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43673/console) for   PR 9029 at commit [`8813b1c`](https://github.com/apache/spark/commit/8813b1cba51e10c4e3e6e3a7b27b132ca9d8ac51).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147843748
  
      [Test build #43668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43668/console) for   PR 9029 at commit [`d4eff64`](https://github.com/apache/spark/commit/d4eff64c67452b166e86b2bc3d9a2486b8f18657).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147829269
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by mfliu <gi...@git.apache.org>.
Github user mfliu commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146672272
  
    The errors in the log file don't seem to be related to the changes I made? They are primarily in PythonRDD.scala:
    java.net.SocketException: Socket is closed
    
    org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
    	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:203)
    	at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
    	at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
    	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    
    Caused by: java.io.EOFException
    	at java.io.DataInputStream.readInt(DataInputStream.java:392)
    	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:139)
    
    Errors seem to be from PythonRDD.scala, but I made no changes to that file, and I'm not sure how changing R code affects that interface?
    
    Can someone else take a look? @shivaram, @sun-rui 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147838614
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146881195
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147859792
  
    looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147830691
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9029#discussion_r41592873
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1314,50 +1273,21 @@ setClassUnion("characterOrColumn", c("character", "Column"))
     #' path <- "path/to/file.json"
     #' df <- jsonFile(sqlContext, path)
     #' arrange(df, df$col1)
    +#' arrange(df, "col1")
     #' arrange(df, asc(df$col1), desc(abs(df$col2)))
    -#' arrange(df, "col1", decreasing = TRUE)
    -#' arrange(df, "col1", "col2", decreasing = c(TRUE, FALSE))
     #' }
     setMethod("arrange",
    -          signature(x = "DataFrame", col = "Column"),
    +          signature(x = "DataFrame", col = "characterOrColumn"),
               function(x, col, ...) {
    +            if (class(col) == "character") {
    +              sdf <- callJMethod(x@sdf, "sort", col, toSeq(...))
    +            } else if (class(col) == "Column") {
                   jcols <- lapply(list(col, ...), function(c) {
                     c@jc
                   })
    -
    -            sdf <- callJMethod(x@sdf, "sort", jcols)
    -            dataFrame(sdf)
    -          })
    -
    -#' @rdname arrange
    -#' @export
    -setMethod("arrange",
    -          signature(x = "DataFrame", col = "character"),
    -          function(x, col, ..., decreasing = FALSE) {
    --- End diff --
    
    It looks like this is undoing a recent PR, could you check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147843796
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147843798
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43668/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147929731
  
    Thanks @mfliu - LGTM. Merging this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147829263
  
      [Test build #43664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43664/console) for   PR 9029 at commit [`9603722`](https://github.com/apache/spark/commit/96037228048a290cfc07bde9a991bec373d36d1b).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147852150
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43671/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9029


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147832715
  
      [Test build #43666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43666/consoleFull) for   PR 9029 at commit [`a67965a`](https://github.com/apache/spark/commit/a67965ae9322303511a2b1f2fb3a2d2be043fb85).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147839651
  
      [Test build #43668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43668/consoleFull) for   PR 9029 at commit [`d4eff64`](https://github.com/apache/spark/commit/d4eff64c67452b166e86b2bc3d9a2486b8f18657).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9029#discussion_r41648156
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1414,9 +1414,10 @@ setMethod("where",
     #' @param x A Spark DataFrame
     #' @param y A Spark DataFrame
     #' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
    -#' Column expression. If joinExpr is omitted, join() wil perform a Cartesian join
    +#' Column expression. If joinExpr is omitted, join() will perform a Cartesian join
     #' @param joinType The type of join to perform. The following join types are available:
    -#' 'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'. The default joinType is "inner".
    +#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left',
    --- End diff --
    
    any reason we should change `right_outer` to `rightouter` ? It'll break code that used to work with previous versions ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9029#discussion_r41898036
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1414,9 +1414,10 @@ setMethod("where",
     #' @param x A Spark DataFrame
     #' @param y A Spark DataFrame
     #' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
    -#' Column expression. If joinExpr is omitted, join() wil perform a Cartesian join
    +#' Column expression. If joinExpr is omitted, join() will perform a Cartesian join
     #' @param joinType The type of join to perform. The following join types are available:
    -#' 'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'. The default joinType is "inner".
    +#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left',
    --- End diff --
    
    Yeah, so lets support both right_outer and rightouter. That way we don't break backwards compatibility. One simple way to do this as @sun-rui said is to just replace all "_"s in the join string with "" using `gsub` or something like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147855832
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43673/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146883721
  
      [Test build #43469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43469/consoleFull) for   PR 9029 at commit [`efa072c`](https://github.com/apache/spark/commit/efa072caaaf9c4805e63da22a9342504a292c08c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SparkR joins. Used DataFrame.R and test_sparkS...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146649495
  
    Thanks for PR @mfliu  - Could you format the PR title as `[SPARKR] [SPARK-10981] SparkR Join improvements` ? More instructions are at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PreparingtoContributeCodeChanges


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by mfliu <gi...@git.apache.org>.
Github user mfliu commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146872159
  
    @felixcheung Yes, you are correct. The arrange function was different. I pulled again and changed those files and it is working on my machine. Can you test again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146884084
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43469/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146884082
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SparkR joins. Used DataFrame.R and test_sparkS...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146649529
  
    cc @sun-rui 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146651225
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9029#discussion_r41592919
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1854,30 +1784,36 @@ setMethod("fillna",
                 sdf <- if (length(cols) == 0) {
                   callJMethod(naFunctions, "fill", value)
                 } else {
    -              callJMethod(naFunctions, "fill", value, as.list(cols))
    +              callJMethod(naFunctions, "fill", value, listToSeq(as.list(cols)))
                 }
                 dataFrame(sdf)
               })
     
    -#' This function downloads the contents of a DataFrame into an R's data.frame.
    -#' Since data.frames are held in memory, ensure that you have enough memory
    -#' in your system to accommodate the contents.
    +#' crosstab
     #'
    -#' @title Download data from a DataFrame into a data.frame
    -#' @param x a DataFrame
    -#' @return a data.frame
    -#' @rdname as.data.frame
    -#' @examples \dontrun{
    +#' Computes a pair-wise frequency table of the given columns. Also known as a contingency
    +#' table. The number of distinct values for each column should be less than 1e4. At most 1e6
    +#' non-zero pair frequencies will be returned.
     #'
    -#' irisDF <- createDataFrame(sqlContext, iris)
    -#' df <- as.data.frame(irisDF[irisDF$Species == "setosa", ])
    +#' @param col1 name of the first column. Distinct items will make the first item of each row.
    +#' @param col2 name of the second column. Distinct items will make the column names of the output.
    +#' @return a local R data.frame representing the contingency table. The first column of each row
    +#'         will be the distinct values of `col1` and the column names will be the distinct values
    +#'         of `col2`. The name of the first column will be `$col1_$col2`. Pairs that have no
    +#'         occurrences will have zero as their counts.
    +#'
    +#' @rdname statfunctions
    +#' @name crosstab
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- jsonFile(sqlCtx, "/path/to/file.json")
    +#' ct = crosstab(df, "title", "gender")
     #' }
    -setMethod("as.data.frame",
    --- End diff --
    
    also for this from a recent PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146895610
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146901456
  
      [Test build #43473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43473/console) for   PR 9029 at commit [`216be37`](https://github.com/apache/spark/commit/216be3780d909ee6ac8d217cf34cb7dec073793e).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146901607
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43473/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147852147
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146873838
  
      [Test build #43468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43468/consoleFull) for   PR 9029 at commit [`d4a1ed3`](https://github.com/apache/spark/commit/d4a1ed391795881d2fd82cba3c01aaa26e3ace20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147839679
  
    @mfliu Just FYI, you can check the lint-r tests locally by running the script `dev/lint-r`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146874145
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by mfliu <gi...@git.apache.org>.
Github user mfliu commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146914689
  
    That change was made based on the comment on the JIRA report:
    https://issues.apache.org/jira/browse/SPARK-10981
    
    In the PR, please:
    1. Support all join types defined in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala (You can remove the "_" char from the currently supported join types in SparkR)
    2. Add test cases for missing join types including "leftsemi"
    
    Perhaps I misunderstood?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146881285
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147845868
  
      [Test build #43671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43671/consoleFull) for   PR 9029 at commit [`d4eff64`](https://github.com/apache/spark/commit/d4eff64c67452b166e86b2bc3d9a2486b8f18657).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147833871
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43666/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147844883
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146901604
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147855830
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147852003
  
      [Test build #43671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43671/console) for   PR 9029 at commit [`d4eff64`](https://github.com/apache/spark/commit/d4eff64c67452b166e86b2bc3d9a2486b8f18657).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147838635
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147827696
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-146872375
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-10981] SparkR Join improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9029#issuecomment-147827752
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org