You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhonghaihua <gi...@git.apache.org> on 2015/12/03 07:28:20 UTC

[GitHub] spark pull request: Join nondeterministic

GitHub user zhonghaihua opened a pull request:

    https://github.com/apache/spark/pull/10122

    Join nondeterministic

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhonghaihua/spark join_nondeterministic

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10122
    
----
commit 6d8ebc801799714d297c83be6935b37e26dc2df7
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-08-26T05:35:49Z

    [SPARK-10243] [MLLIB] update since versions in mllib.tree
    
    Same as #8421 but for `mllib.tree`.
    
    cc jkbradley
    
    Author: Xiangrui Meng <me...@databricks.com>
    
    Closes #8442 from mengxr/SPARK-10236.
    
    (cherry picked from commit fb7e12fe2e14af8de4c206ca8096b2e8113bfddc)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 08d390f457f80ffdc2dfce61ea579d9026047f12
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-08-26T05:49:33Z

    [SPARK-10235] [MLLIB] update since versions in mllib.regression
    
    Same as #8421 but for `mllib.regression`.
    
    cc freeman-lab dbtsai
    
    Author: Xiangrui Meng <me...@databricks.com>
    
    Closes #8426 from mengxr/SPARK-10235 and squashes the following commits:
    
    6cd28e4 [Xiangrui Meng] update since versions in mllib.regression
    
    (cherry picked from commit 4657fa1f37d41dd4c7240a960342b68c7c591f48)
    Signed-off-by: DB Tsai <db...@netflix.com>

commit 21a10a86d20ec1a6fea42286b4d2aae9ce7e848d
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-08-26T06:45:41Z

    [SPARK-10236] [MLLIB] update since versions in mllib.feature
    
    Same as #8421 but for `mllib.feature`.
    
    cc dbtsai
    
    Author: Xiangrui Meng <me...@databricks.com>
    
    Closes #8449 from mengxr/SPARK-10236.feature and squashes the following commits:
    
    0e8d658 [Xiangrui Meng] remove unnecessary comment
    ad70b03 [Xiangrui Meng] update since versions in mllib.feature
    
    (cherry picked from commit 321d7759691bed9867b1f0470f12eab2faa50aff)
    Signed-off-by: DB Tsai <db...@netflix.com>

commit 5220db9e352b5d5eae59cead9478ca0a9f73f16b
Author: felixcheung <fe...@hotmail.com>
Date:   2015-08-26T06:48:16Z

    [SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select)
    
    Add support for
    ```
       df[df$name == "Smith", c(1,2)]
       df[df$age %in% c(19, 30), 1:2]
    ```
    
    shivaram
    
    Author: felixcheung <fe...@hotmail.com>
    
    Closes #8394 from felixcheung/rsubset.
    
    (cherry picked from commit 75d4773aa50e24972c533e8b48697fde586429eb)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit b0dde36009ce371824ce3e47e60fa0711d7733bb
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-08-26T18:47:05Z

    [SPARK-9665] [MLLIB] audit MLlib API annotations
    
    I only found `ml.NaiveBayes` missing `Experimental` annotation. This PR doesn't cover Python APIs.
    
    cc jkbradley
    
    Author: Xiangrui Meng <me...@databricks.com>
    
    Closes #8452 from mengxr/SPARK-9665.
    
    (cherry picked from commit 6519fd06cc8175c9182ef16cf8a37d7f255eb846)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit efbd7af44e855efcbb1fa224e80db24947e2b153
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-08-26T21:02:19Z

    [SPARK-10241] [MLLIB] update since versions in mllib.recommendation
    
    Same as #8421 but for `mllib.recommendation`.
    
    cc srowen coderxiang
    
    Author: Xiangrui Meng <me...@databricks.com>
    
    Closes #8432 from mengxr/SPARK-10241.
    
    (cherry picked from commit 086d4681df3ebfccfc04188262c10482f44553b0)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 0bdb800575ae2872e2655983a1be94dcf2e8c36b
Author: Davies Liu <da...@databricks.com>
Date:   2015-08-26T23:04:44Z

    [SPARK-10305] [SQL] fix create DataFrame from Python class
    
    cc jkbradley
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #8470 from davies/fix_create_df.
    
    (cherry picked from commit d41d6c48207159490c1e1d9cc54015725cfa41b2)
    Signed-off-by: Davies Liu <da...@gmail.com>

commit cef707d2185ca7e0c5635fabe709d5e26915b5bb
Author: Shivaram Venkataraman <sh...@cs.berkeley.edu>
Date:   2015-08-27T01:13:07Z

    [SPARK-10308] [SPARKR] Add %in% to the exported namespace
    
    I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine.
    
    cc yu-iskw
    
    Author: Shivaram Venkataraman <sh...@cs.berkeley.edu>
    
    Closes #8473 from shivaram/in-namespace.
    
    (cherry picked from commit ad7f0f160be096c0fdae6e6cf7e3b6ba4a606de7)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 04c85a8ecbb8a27628a7d1260c19531d56d764d3
Author: Cheng Lian <li...@databricks.com>
Date:   2015-08-27T01:14:54Z

    [SPARK-9424] [SQL] Parquet programming guide updates for 1.5
    
    Author: Cheng Lian <li...@databricks.com>
    
    Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.

commit 165be9ad176dcd1c431a6338ff86b339d23b6d0e
Author: Shivaram Venkataraman <sh...@cs.berkeley.edu>
Date:   2015-08-27T05:27:31Z

    [SPARK-10219] [SPARKR] Fix varargsToEnv and add test case
    
    cc sun-rui davies
    
    Author: Shivaram Venkataraman <sh...@cs.berkeley.edu>
    
    Closes #8475 from shivaram/varargs-fix.
    
    (cherry picked from commit e936cf8088a06d6aefce44305f3904bbeb17b432)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 30f0f7e4e39b58091e0a10199b6da81d14fa7fdb
Author: Moussa Taifi <mo...@gmail.com>
Date:   2015-08-27T09:34:47Z

    [DOCS] [STREAMING] [KAFKA] Fix typo in exactly once semantics
    
    Fix Typo in exactly once semantics
    [Semantics of output operations] link
    
    Author: Moussa Taifi <mo...@gmail.com>
    
    Closes #8468 from moutai/patch-3.
    
    (cherry picked from commit 9625d13d575c97bbff264f6a94838aae72c9202d)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 965b3bb0ee4171f2c2533c0623f2cd680d700a2b
Author: Michael Armbrust <mi...@databricks.com>
Date:   2015-08-27T18:45:15Z

    [SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide
    
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #8441 from marmbrus/documentation.
    
    (cherry picked from commit dc86a227e4fc8a9d8c3e8c68da8dff9298447fd0)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit db197150102c5ecb829dbbc64fc28b88fcc9c493
Author: CodingCat <zh...@gmail.com>
Date:   2015-08-27T19:19:09Z

    [SPARK-10315] remove document on spark.akka.failure-detector.threshold
    
    https://issues.apache.org/jira/browse/SPARK-10315
    
    this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold'
    
    Author: CodingCat <zh...@gmail.com>
    
    Closes #8483 from CodingCat/SPARK_10315.
    
    (cherry picked from commit 84baa5e9b5edc8c55871fbed5057324450bf097f)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 66db9cdc6ad3367ddf8d49d4d48c7506a4459675
Author: Yuhao Yang <hh...@gmail.com>
Date:   2015-08-27T20:57:20Z

    [SPARK-9901] User guide for RowMatrix Tall-and-skinny QR
    
    jira: https://issues.apache.org/jira/browse/SPARK-9901
    
    The jira covers only the document update. I can further provide example code for QR (like the ones for SVD and PCA) in a separate PR.
    
    Author: Yuhao Yang <hh...@gmail.com>
    
    Closes #8462 from hhbyyh/qrDoc.
    
    (cherry picked from commit 6185cdd2afcd492b77ff225b477b3624e3bc7bb2)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 501e10a912d540d02fd3a611911e65b781692109
Author: MechCoder <ma...@gmail.com>
Date:   2015-08-27T22:33:43Z

    [SPARK-9906] [ML] User guide for LogisticRegressionSummary
    
    User guide for LogisticRegression summaries
    
    Author: MechCoder <ma...@gmail.com>
    Author: Manoj Kumar <mk...@nyu.edu>
    Author: Feynman Liang <fl...@databricks.com>
    
    Closes #8197 from MechCoder/log_summary_user_guide.
    
    (cherry picked from commit c94ecdfc5b3c0fe6c38a170dc2af9259354dc9e3)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit 3239911eae7cb7ffdef0de71e5bc8224f666eb88
Author: Feynman Liang <fl...@databricks.com>
Date:   2015-08-27T23:10:37Z

    [SPARK-9680] [MLLIB] [DOC] StopWordsRemovers user guide and Java compatibility test
    
    * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
    * Cleans up scaladocs for public methods
    * Adds test for Java compatibility
    * Follow up Python user guide code example is tracked by SPARK-10249
    
    Author: Feynman Liang <fl...@databricks.com>
    
    Closes #8436 from feynmanliang/SPARK-10230.
    
    (cherry picked from commit 5bfe9e1111d9862084586549a7dc79476f67bab9)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 351e849bbeaeee9dcf95d465ada1270a059da2f1
Author: Yin Huai <yh...@databricks.com>
Date:   2015-08-27T23:11:25Z

    [SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path
    
    https://issues.apache.org/jira/browse/SPARK-10287
    
    After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #8469 from yhuai/jsonRefresh.
    
    (cherry picked from commit b3dd569ad40905f8861a547a1e25ed3ca8e1d272)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit fc4c3bf43626ecce75a909d9d0f1acd973f75fbf
Author: Davies Liu <da...@databricks.com>
Date:   2015-08-27T23:38:00Z

    [SPARK-10321] sizeInBytes in HadoopFsRelation
    
    Having sizeInBytes in HadoopFsRelation to enable broadcast join.
    
    cc marmbrus
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #8490 from davies/sizeInByte.
    
    (cherry picked from commit 54cda0deb6bebf1470f16ba5bcc6c4fb842bdac1)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 6ccc0df8e416993730e5c6550a98cb6f2187a914
Author: MechCoder <ma...@gmail.com>
Date:   2015-08-28T04:44:06Z

    [SPARK-9911] [DOC] [ML] Update Userguide for Evaluator
    
    I added a small note about the different types of evaluator and the metrics used.
    
    Author: MechCoder <ma...@gmail.com>
    
    Closes #8304 from MechCoder/multiclass_evaluator.
    
    (cherry picked from commit 30734d45fbbb269437c062241a9161e198805a76)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit ede8c625cd6631d54961c3b39996e3c60bc08be4
Author: Feynman Liang <fl...@databricks.com>
Date:   2015-08-28T04:55:20Z

    [SPARK-9905] [ML] [DOC] Adds LinearRegressionSummary user guide
    
    * Adds user guide for `LinearRegressionSummary`
    * Fixes unresolved issues in  #8197
    
    CC jkbradley mengxr
    
    Author: Feynman Liang <fl...@databricks.com>
    
    Closes #8491 from feynmanliang/SPARK-9905.
    
    (cherry picked from commit af0e1249b1c881c0fa7a921fd21fd2c27214b980)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit c77cf867258caede0639445798e2cd3288e246a7
Author: Cheng Lian <li...@databricks.com>
Date:   2015-08-28T05:30:01Z

    [SPARK-SQL] [MINOR] Fixes some typos in HiveContext
    
    Author: Cheng Lian <li...@databricks.com>
    
    Closes #8481 from liancheng/hive-context-typo.
    
    (cherry picked from commit 89b943438512fcfb239c268b43431397de46cbcf)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit bcb8fa849e7684685e0761153daf976ff79e726f
Author: noelsmith <ma...@noelsmith.com>
Date:   2015-08-28T06:59:30Z

    [SPARK-10188] [PYSPARK] Pyspark CrossValidator with RMSE selects incorrect model
    
    * Added isLargerBetter() method to Pyspark Evaluator to match the Scala version.
    * JavaEvaluator delegates isLargerBetter() to underlying Scala object.
    * Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax.
    * Added test cases for where smaller is better (RMSE) and larger is better (R-Squared).
    
    (This contribution is my original work and that I license the work to the project under Sparks' open source license)
    
    Author: noelsmith <ma...@noelsmith.com>
    
    Closes #8399 from noel-smith/pyspark-rmse-xval-fix.
    
    (cherry picked from commit 7583681e6b0824d7eed471dc4d8fa0b2addf9ffc)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit 9b7f8f29373972f115a5d9068b6432b6757f8ac7
Author: Shivaram Venkataraman <sh...@cs.berkeley.edu>
Date:   2015-08-28T07:37:50Z

    [SPARK-10328] [SPARKR] Fix generic for na.omit
    
    S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html
    
    Author: Shivaram Venkataraman <sh...@cs.berkeley.edu>
    Author: Shivaram Venkataraman <sh...@gmail.com>
    Author: Yu ISHIKAWA <yu...@gmail.com>
    
    Closes #8495 from shivaram/na-omit-fix.
    
    (cherry picked from commit 2f99c37273c1d82e2ba39476e4429ea4aaba7ec6)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit f0c4470d43e833d2382a8b98bf4aa21ae9451d00
Author: hyukjinkwon <gu...@gmail.com>
Date:   2015-08-20T00:13:25Z

    [SPARK-10035] [SQL] Parquet filters does not process EqualNullSafe filter.
    
    As I talked with Lian,
    
    1. I added EquelNullSafe to ParquetFilters
     - It uses the same equality comparison filter with EqualTo since the Parquet filter performs actually null-safe equality comparison.
    
    2. Updated the test code (ParquetFilterSuite)
     - Convert catalyst.Expression to sources.Filter
     - Removed Cast since only Literal is picked up as a proper Filter in DataSourceStrategy
     - Added EquelNullSafe comparison
    
    3. Removed deprecated createFilter for catalyst.Expression
    
    Author: hyukjinkwon <gu...@gmail.com>
    Author: 권혁진 <gu...@gmail.com>
    
    Closes #8275 from HyukjinKwon/master.
    
    (cherry picked from commit ba5f7e1842f2c5852b5309910c0d39926643da69)
    Signed-off-by: Cheng Lian <li...@databricks.com>

commit 8eff0696c44b5d5880acc6a6edc8dfb0a59cf958
Author: Sean Owen <so...@cloudera.com>
Date:   2015-08-28T08:32:23Z

    [SPARK-10295] [CORE] Dynamic allocation in Mesos does not release when RDDs are cached
    
    Remove obsolete warning about dynamic allocation not working with cached RDDs
    
    See discussion in https://issues.apache.org/jira/browse/SPARK-10295
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #8489 from srowen/SPARK-10295.
    
    (cherry picked from commit cc39803062119c1d14611dc227b9ed0ed1284d38)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit e23ffd6758a141ec99b8beeadefc1bd3432cc340
Author: Keiji Yoshida <yo...@gmail.com>
Date:   2015-08-28T08:36:50Z

    Fix DynamodDB/DynamoDB typo in Kinesis Integration doc
    
    Fix DynamodDB/DynamoDB typo in Kinesis Integration doc
    
    Author: Keiji Yoshida <yo...@gmail.com>
    
    Closes #8501 from yosssi/patch-1.
    
    (cherry picked from commit 18294cd8710427076caa86bfac596de67089d57e)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 3ccd2e647a3c3039b1959a1e39c24cbe4fc6d9c5
Author: Dharmesh Kakadia <dh...@users.noreply.github.com>
Date:   2015-08-28T08:38:35Z

    typo in comment
    
    Author: Dharmesh Kakadia <dh...@users.noreply.github.com>
    
    Closes #8497 from dharmeshkakadia/patch-2.
    
    (cherry picked from commit 71a077f6c16c8816eae13341f645ba50d997f63d)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 0cd49bacc8ec344a60bc2f5bf4c90cfd8c79abed
Author: Yuhao Yang <hh...@gmail.com>
Date:   2015-08-28T15:00:44Z

    [SPARK-9890] [DOC] [ML] User guide for CountVectorizer
    
    jira: https://issues.apache.org/jira/browse/SPARK-9890
    
    document with Scala and java examples
    
    Author: Yuhao Yang <hh...@gmail.com>
    
    Closes #8487 from hhbyyh/cvDoc.
    
    (cherry picked from commit e2a843090cb031f6aa774f6d9c031a7f0f732ee1)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 0abbc181380e644374f4217ee84b76fae035aee2
Author: Luciano Resende <lr...@apache.org>
Date:   2015-08-28T16:13:21Z

    [SPARK-8952] [SPARKR] - Wrap normalizePath calls with suppressWarnings
    
    This is based on davies comment on SPARK-8952 which suggests to only call normalizePath() when path starts with '~'
    
    Author: Luciano Resende <lr...@apache.org>
    
    Closes #8343 from lresende/SPARK-8952.
    
    (cherry picked from commit 499e8e154bdcc9d7b2f685b159e0ddb4eae48fe4)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit ccda27a9beb97b11c2522a0700165fd849af44b1
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-08-28T18:51:42Z

    [SPARK-10325] Override hashCode() for public Row
    
    This commit fixes an issue where the public SQL `Row` class did not override `hashCode`, causing it to violate the hashCode() + equals() contract. To fix this, I simply ported the `hashCode` implementation from the 1.4.x version of `Row`.
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #8500 from JoshRosen/SPARK-10325 and squashes the following commits:
    
    51ffea1 [Josh Rosen] Override hashCode() for public Row.
    
    (cherry picked from commit d3f87dc39480f075170817bbd00142967a938078)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>
    
    Conflicts:
    	sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Join nondeterministic

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10122#issuecomment-161531329
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Join nondeterministic

Posted by zhonghaihua <gi...@git.apache.org>.
Github user zhonghaihua commented on the pull request:

    https://github.com/apache/spark/pull/10122#issuecomment-161533958
  
    I am so sorry to create this pull request, this pr is not on the right branch.I will close it right now.
    This is my mistake, cause trouble, very sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Join nondeterministic

Posted by zhonghaihua <gi...@git.apache.org>.
Github user zhonghaihua closed the pull request at:

    https://github.com/apache/spark/pull/10122


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org