You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by superbobry <gi...@git.apache.org> on 2017/05/02 21:44:56 UTC

[GitHub] spark pull request #17837: Sync with upstream 2.1

GitHub user superbobry opened a pull request:

    https://github.com/apache/spark/pull/17837

    Sync with upstream 2.1

    ## What changes were proposed in this pull request?
    
    This is a backport of the upstream `branch-2.1`.
    
    ## How was this patch tested?
    
    Test upstream by Spark contributors.
    
    ---
    
    Note that Criteo version is now 2.1.2-criteo, because 2.1.1 was released (tagged, actually.)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/criteo-forks/spark sync-with-upstream-2.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17837.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17837
    
----
commit d489e1dc7ecf7cf081141d3f45f86c39fc3db1fe
Author: Liwei Lin <lw...@gmail.com>
Date:   2017-01-02T14:40:06Z

    [SPARK-19041][SS] Fix code snippet compilation issues in Structured Streaming Programming Guide
    
    ## What changes were proposed in this pull request?
    
    Currently some code snippets in the programming guide just do not compile. We should fix them.
    
    ## How was this patch tested?
    
    ```
    SKIP_API=1 jekyll build
    ```
    
    ## Screenshot from part of the change:
    
    ![snip20161231_37](https://cloud.githubusercontent.com/assets/15843379/21576864/cc52fcd8-cf7b-11e6-8bd6-f935d9ff4a6b.png)
    
    Author: Liwei Lin <lw...@gmail.com>
    
    Closes #16442 from lw-lin/ss-pro-guide-.

commit 94272a9600405442bfe485b17e55a84b85c25da3
Author: gatorsmile <ga...@gmail.com>
Date:   2016-12-31T11:40:28Z

    [SPARK-19028][SQL] Fixed non-thread-safe functions used in SessionCatalog
    
    ### What changes were proposed in this pull request?
    Fixed non-thread-safe functions used in SessionCatalog:
    - refreshTable
    - lookupRelation
    
    ### How was this patch tested?
    N/A
    
    Author: gatorsmile <ga...@gmail.com>
    
    Closes #16437 from gatorsmile/addSyncToLookUpTable.
    
    (cherry picked from commit 35e974076dcbc5afde8d4259ce88cb5f29d94920)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 776255065c13df7b4505c225546b4b66cd929c76
Author: gatorsmile <ga...@gmail.com>
Date:   2017-01-03T19:43:47Z

    [SPARK-19048][SQL] Delete Partition Location when Dropping Managed Partitioned Tables in InMemoryCatalog
    
    ### What changes were proposed in this pull request?
    The data in the managed table should be deleted after table is dropped. However, if the partition location is not under the location of the partitioned table, it is not deleted as expected. Users can specify any location for the partition when they adding a partition.
    
    This PR is to delete partition location when dropping managed partitioned tables stored in `InMemoryCatalog`.
    
    ### How was this patch tested?
    Added test cases for both HiveExternalCatalog and InMemoryCatalog
    
    Author: gatorsmile <ga...@gmail.com>
    
    Closes #16448 from gatorsmile/unsetSerdeProp.
    
    (cherry picked from commit b67b35f76b684c5176dc683e7491fd01b43f4467)
    Signed-off-by: gatorsmile <ga...@gmail.com>

commit 1ecf1a953ee0f0f0925bb8a3df54d3e762116f1a
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-01-04T17:56:11Z

    [SPARK-18877][SQL][BACKPORT-2.1] CSVInferSchema.inferField` on DecimalType should find a common type with `typeSoFar`
    
    ## What changes were proposed in this pull request?
    
    CSV type inferencing causes `IllegalArgumentException` on decimal numbers with heterogeneous precisions and scales because the current logic uses the last decimal type in a **partition**. Specifically, `inferRowType`, the **seqOp** of **aggregate**, returns the last decimal type. This PR fixes it to use `findTightestCommonType`.
    
    **decimal.csv**
    ```
    9.03E+12
    1.19E+11
    ```
    
    **BEFORE**
    ```scala
    scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
    root
     |-- _c0: decimal(3,-9) (nullable = true)
    
    scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
    16/12/16 14:32:49 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4)
    java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 3
    ```
    
    **AFTER**
    ```scala
    scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
    root
     |-- _c0: decimal(4,-9) (nullable = true)
    
    scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
    +---------+
    |      _c0|
    +---------+
    |9.030E+12|
    | 1.19E+11|
    +---------+
    ```
    
    ## How was this patch tested?
    
    Pass the newly add test case.
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #16463 from dongjoon-hyun/SPARK-18877-BACKPORT-21.

commit 4ca1788805e4a0131ba8f0ccb7499ee0e0242837
Author: jerryshao <ss...@hortonworks.com>
Date:   2017-01-06T16:07:54Z

    [SPARK-19033][CORE] Add admin acls for history server
    
    ## What changes were proposed in this pull request?
    
    Current HistoryServer's ACLs is derived from application event-log, which means the newly changed ACLs cannot be applied to the old data, this will become a problem where newly added admin cannot access the old application history UI, only the new application can be affected.
    
    So here propose to add admin ACLs for history server, any configured user/group could have the view access to all the applications, while the view ACLs derived from application run-time still take effect.
    
    ## How was this patch tested?
    
    Unit test added.
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #16470 from jerryshao/SPARK-19033.
    
    (cherry picked from commit 4a4c3dc9ca10e52f7981b225ec44e97247986905)
    Signed-off-by: Tom Graves <tg...@yahoo-inc.com>

commit ce9bfe6db63582d632f7d57cbf37ee7b29135198
Author: zuotingbing <zu...@zte.com.cn>
Date:   2017-01-06T17:57:49Z

    [SPARK-19083] sbin/start-history-server.sh script use of $@ without quotes
    
    JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19083#
    
    sbin/start-history-server.sh script use of $ without quotes, this will affect the length of args which used in HistoryServerArguments::parse(args: List[String])
    
    Author: zuotingbing <zu...@zte.com.cn>
    
    Closes #16484 from zuotingbing/sh.
    
    (cherry picked from commit a9a137377e4cf293325ccd7368698f20b5d6b98a)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit ee735a8a85d7f015188f7cb31975f60cc969e453
Author: Tathagata Das <ta...@gmail.com>
Date:   2017-01-06T19:29:01Z

    [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options
    
    ## What changes were proposed in this pull request?
    
    Updates
    - Updated Late Data Handling section by adding a figure for Update Mode. Its more intuitive to explain late data handling with Update Mode, so I added the new figure before the Append Mode figure.
    - Updated Output Modes section with Update mode
    - Added options for all the sources and sinks
    
    ---------------------------
    ---------------------------
    
    ![image](https://cloud.githubusercontent.com/assets/663212/21665176/f150b224-d29f-11e6-8372-14d32da21db9.png)
    
    ---------------------------
    ---------------------------
    <img width="931" alt="screen shot 2017-01-03 at 6 09 11 pm" src="https://cloud.githubusercontent.com/assets/663212/21629740/d21c9bb8-d1df-11e6-915b-488a59589fa6.png">
    <img width="933" alt="screen shot 2017-01-03 at 6 10 00 pm" src="https://cloud.githubusercontent.com/assets/663212/21629749/e22bdabe-d1df-11e6-86d3-7e51d2f28dbc.png">
    
    ---------------------------
    ---------------------------
    ![image](https://cloud.githubusercontent.com/assets/663212/21665200/108e18fc-d2a0-11e6-8640-af598cab090b.png)
    ![image](https://cloud.githubusercontent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png)
    ![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png)
    
    Author: Tathagata Das <ta...@gmail.com>
    
    Closes #16468 from tdas/SPARK-19074.
    
    (cherry picked from commit b59cddaba01cbdf50dbe8fe7ef7b9913bad9552d)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit 86b66216de411f8cbc79ede62b353f7cbb550903
Author: wm624@hotmail.com <wm...@hotmail.com>
Date:   2017-01-07T19:07:49Z

    [SPARK-19110][ML][MLLIB] DistributedLDAModel returns different logPrior for original and loaded model
    
    ## What changes were proposed in this pull request?
    
    While adding DistributedLDAModel training summary for SparkR, I found that the logPrior for original and loaded model is different.
    For example, in the test("read/write DistributedLDAModel"), I add the test:
    val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
    val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
    assert(logPrior === logPrior2)
    The test fails:
    -4.394180878889078 did not equal -4.294290536919573
    
    The reason is that `graph.vertices.aggregate(0.0)(seqOp, _ + _)` only returns the value of a single vertex instead of the aggregation of all vertices. Therefore, when the loaded model does the aggregation in a different order, it returns different `logPrior`.
    
    Please refer to #16464 for details.
    ## How was this patch tested?
    Add a new unit test for testing logPrior.
    
    Author: wm624@hotmail.com <wm...@hotmail.com>
    
    Closes #16491 from wangmiao1981/ldabug.
    
    (cherry picked from commit 036b50347c56a3541c526b1270093163b9b79e45)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit c95b58557dec2f4708d5efd9314edd80e0975fc8
Author: Sean Owen <so...@cloudera.com>
Date:   2017-01-07T19:15:51Z

    [SPARK-19106][DOCS] Styling for the configuration docs is broken
    
    configuration.html section headings were not specified correctly in markdown and weren't rendering, being recognized correctly. Removed extra p tags and pulled level 4 titles up to level 3, since level 3 had been skipped. This improves the TOC.
    
    Doc build, manual check.
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #16490 from srowen/SPARK-19106.
    
    (cherry picked from commit 54138f6e89abfc17101b4f2812715784a2b98331)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit ecc16220d2d9eace81de44c4b0aff1c364a35e3f
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-01-08T02:55:01Z

    [SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP TABLE` with `LOCATION`
    
    ## What changes were proposed in this pull request?
    
    This PR adds a new behavior change description on `CREATE TABLE ... LOCATION` at `sql-programming-guide.md` clearly under `Upgrading From Spark SQL 1.6 to 2.0`. This change is introduced at Apache Spark 2.0.0 as [SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276).
    
    ## How was this patch tested?
    
    ```
    SKIP_API=1 jekyll build
    ```
    
    **Newly Added Description**
    <img width="913" alt="new" src="https://cloud.githubusercontent.com/assets/9700541/21743606/7efe2b12-d4ba-11e6-8a0d-551222718ea2.png">
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #16400 from dongjoon-hyun/SPARK-18941.
    
    (cherry picked from commit 923e594844a7ad406195b91877f0fb374d5a454b)
    Signed-off-by: gatorsmile <ga...@gmail.com>

commit 8690d4bd150579e546aec7866b16a77bad1017f5
Author: anabranch <wa...@gmail.com>
Date:   2017-01-09T01:53:53Z

    [SPARK-19127][DOCS] Update Rank Function Documentation
    
    ## What changes were proposed in this pull request?
    
    - [X] Fix inconsistencies in function reference for dense rank and dense
    - [X] Make all languages equivalent in their reference to `dense_rank` and `rank`.
    
    ## How was this patch tested?
    
    N/A for docs.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: anabranch <wa...@gmail.com>
    
    Closes #16505 from anabranch/SPARK-19127.
    
    (cherry picked from commit 1f6ded6455d07ec8828fc9662ddffe55cbba4238)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 8779e6a4685f50a7062842f0d5a606c3a3b092d5
Author: anabranch <wa...@gmail.com>
Date:   2017-01-09T04:37:46Z

    [SPARK-19126][DOCS] Update Join Documentation Across Languages
    
    ## What changes were proposed in this pull request?
    
    - [X] Make sure all join types are clearly mentioned
    - [X] Make join labeling/style consistent
    - [X] Make join label ordering docs the same
    - [X] Improve join documentation according to above for Scala
    - [X] Improve join documentation according to above for Python
    - [X] Improve join documentation according to above for R
    
    ## How was this patch tested?
    No tests b/c docs.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: anabranch <wa...@gmail.com>
    
    Closes #16504 from anabranch/SPARK-19126.
    
    (cherry picked from commit 19d9d4c855eab8f647a5ec66b079172de81221d0)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit 80a3e13e58036c2461c4b721cb298ffd13b7823f
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-01-09T04:42:18Z

    [SPARK-18903][SPARKR][BACKPORT-2.1] Add API to get SparkUI URL
    
    ## What changes were proposed in this pull request?
    
    backport to 2.1
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #16507 from felixcheung/portsparkuir21.

commit 3b6ac323b16f8f6d79ee7bac6e7a57f841897d96
Author: Burak Yavuz <br...@gmail.com>
Date:   2017-01-09T23:17:59Z

    [SPARK-18952][BACKPORT] Regex strings not properly escaped in codegen for aggregations
    
    ## What changes were proposed in this pull request?
    
    Backport for #16361 to 2.1 branch.
    
    ## How was this patch tested?
    
    Unit tests
    
    Author: Burak Yavuz <br...@gmail.com>
    
    Closes #16518 from brkyvz/reg-break-2.1.

commit 65c866ef9e0b325998ce26f698e409c00a3f11e7
Author: Liwei Lin <lw...@gmail.com>
Date:   2017-01-10T11:35:46Z

    [SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` grows beyond 64 KB
    
    ## What changes were proposed in this pull request?
    
    Prior to this patch, we'll generate `compare(...)` for `GeneratedClass$SpecificOrdering` like below, leading to Janino exceptions saying the code grows beyond 64 KB.
    
    ``` scala
    /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering {
    /* ..... */   ...
    /* 10969 */   private int compare(InternalRow a, InternalRow b) {
    /* 10970 */     InternalRow i = null;  // Holds current row being evaluated.
    /* 10971 */
    /* 1.... */     code for comparing field0
    /* 1.... */     code for comparing field1
    /* 1.... */     ...
    /* 1.... */     code for comparing field449
    /* 15012 */
    /* 15013 */     return 0;
    /* 15014 */   }
    /* 15015 */ }
    ```
    
    This patch would break `compare(...)` into smaller `compare_xxx(...)` methods when necessary; then we'll get generated `compare(...)` like:
    
    ``` scala
    /* 001 */ public SpecificOrdering generate(Object[] references) {
    /* 002 */   return new SpecificOrdering(references);
    /* 003 */ }
    /* 004 */
    /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering {
    /* 006 */
    /* 007 */     ...
    /* 1.... */
    /* 11290 */   private int compare_0(InternalRow a, InternalRow b) {
    /* 11291 */     InternalRow i = null;  // Holds current row being evaluated.
    /* 11292 */
    /* 11293 */     i = a;
    /* 11294 */     boolean isNullA;
    /* 11295 */     UTF8String primitiveA;
    /* 11296 */     {
    /* 11297 */
    /* 11298 */       Object obj = ((Expression) references[0]).eval(null);
    /* 11299 */       UTF8String value = (UTF8String) obj;
    /* 11300 */       isNullA = false;
    /* 11301 */       primitiveA = value;
    /* 11302 */     }
    /* 11303 */     i = b;
    /* 11304 */     boolean isNullB;
    /* 11305 */     UTF8String primitiveB;
    /* 11306 */     {
    /* 11307 */
    /* 11308 */       Object obj = ((Expression) references[0]).eval(null);
    /* 11309 */       UTF8String value = (UTF8String) obj;
    /* 11310 */       isNullB = false;
    /* 11311 */       primitiveB = value;
    /* 11312 */     }
    /* 11313 */     if (isNullA && isNullB) {
    /* 11314 */       // Nothing
    /* 11315 */     } else if (isNullA) {
    /* 11316 */       return -1;
    /* 11317 */     } else if (isNullB) {
    /* 11318 */       return 1;
    /* 11319 */     } else {
    /* 11320 */       int comp = primitiveA.compare(primitiveB);
    /* 11321 */       if (comp != 0) {
    /* 11322 */         return comp;
    /* 11323 */       }
    /* 11324 */     }
    /* 11325 */
    /* 11326 */
    /* 11327 */     i = a;
    /* 11328 */     boolean isNullA1;
    /* 11329 */     UTF8String primitiveA1;
    /* 11330 */     {
    /* 11331 */
    /* 11332 */       Object obj1 = ((Expression) references[1]).eval(null);
    /* 11333 */       UTF8String value1 = (UTF8String) obj1;
    /* 11334 */       isNullA1 = false;
    /* 11335 */       primitiveA1 = value1;
    /* 11336 */     }
    /* 11337 */     i = b;
    /* 11338 */     boolean isNullB1;
    /* 11339 */     UTF8String primitiveB1;
    /* 11340 */     {
    /* 11341 */
    /* 11342 */       Object obj1 = ((Expression) references[1]).eval(null);
    /* 11343 */       UTF8String value1 = (UTF8String) obj1;
    /* 11344 */       isNullB1 = false;
    /* 11345 */       primitiveB1 = value1;
    /* 11346 */     }
    /* 11347 */     if (isNullA1 && isNullB1) {
    /* 11348 */       // Nothing
    /* 11349 */     } else if (isNullA1) {
    /* 11350 */       return -1;
    /* 11351 */     } else if (isNullB1) {
    /* 11352 */       return 1;
    /* 11353 */     } else {
    /* 11354 */       int comp = primitiveA1.compare(primitiveB1);
    /* 11355 */       if (comp != 0) {
    /* 11356 */         return comp;
    /* 11357 */       }
    /* 11358 */     }
    /* 1.... */
    /* 1.... */   ...
    /* 1.... */
    /* 12652 */     return 0;
    /* 12653 */   }
    /* 1.... */
    /* 1.... */   ...
    /* 15387 */
    /* 15388 */   public int compare(InternalRow a, InternalRow b) {
    /* 15389 */
    /* 15390 */     int comp_0 = compare_0(a, b);
    /* 15391 */     if (comp_0 != 0) {
    /* 15392 */       return comp_0;
    /* 15393 */     }
    /* 15394 */
    /* 15395 */     int comp_1 = compare_1(a, b);
    /* 15396 */     if (comp_1 != 0) {
    /* 15397 */       return comp_1;
    /* 15398 */     }
    /* 1.... */
    /* 1.... */     ...
    /* 1.... */
    /* 15450 */     return 0;
    /* 15451 */   }
    /* 15452 */ }
    ```
    ## How was this patch tested?
    - a new added test case which
      - would fail prior to this patch
      - would pass with this patch
    - ordering correctness should already be covered by existing tests like those in `OrderingSuite`
    
    ## Acknowledgement
    
    A major part of this PR - the refactoring work of `splitExpression()` - has been done by ueshin.
    
    Author: Liwei Lin <lw...@gmail.com>
    Author: Takuya UESHIN <ue...@happy-camper.st>
    Author: Takuya Ueshin <ue...@happy-camper.st>
    
    Closes #15480 from lw-lin/spec-ordering-64k-.
    
    (cherry picked from commit acfc5f354332107cc744fb636e3730f6fc48b2fe)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 69d1c4c5c9510ccf05a0f05592201d5b756425f9
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-01-10T18:49:44Z

    [SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` correctly
    
    ## What changes were proposed in this pull request?
    
    `DataStreamReaderWriterSuite` makes test files in source folder like the followings. Interestingly, the root cause is `withSQLConf` fails to reset `OptionalConfigEntry` correctly. In other words, it resets the config into `Some(undefined)`.
    
    ```bash
    $ git status
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
    
            sql/core/%253Cundefined%253E/
            sql/core/%3Cundefined%3E/
    ```
    
    ## How was this patch tested?
    
    Manual.
    ```
    build/sbt "project sql" test
    git status
    ```
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #16522 from dongjoon-hyun/SPARK-19137.
    
    (cherry picked from commit d5b1dc934a2482886c2c095de90e4c6a49ec42bd)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit e0af4b7263a49419fefc36a6dedf2183c1157912
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-01-10T14:24:45Z

    [SPARK-19113][SS][TESTS] Set UncaughtExceptionHandler in onQueryStarted to ensure catching fatal errors during query initialization
    
    ## What changes were proposed in this pull request?
    
    StreamTest sets `UncaughtExceptionHandler` after starting the query now. It may not be able to catch fatal errors during query initialization. This PR uses `onQueryStarted` callback to fix it.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #16492 from zsxwing/SPARK-19113.

commit 81c9430900f44f0602c7d32b298b90afa7450113
Author: Sean Owen <so...@cloudera.com>
Date:   2017-01-10T20:40:21Z

    [SPARK-18997][CORE] Recommended upgrade libthrift to 0.9.3
    
    ## What changes were proposed in this pull request?
    
    Updates to libthrift 0.9.3 to address a CVE.
    
    ## How was this patch tested?
    
    Existing tests.
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #16530 from srowen/SPARK-18997.
    
    (cherry picked from commit 856bae6af64982ae0221948c58ff564887e54a70)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit 230607d62493c36b214c01a70aa9b0dbb3a9ad4d
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-01-11T01:58:11Z

    [SPARK-19140][SS] Allow update mode for non-aggregation streaming queries
    
    ## What changes were proposed in this pull request?
    
    This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #16520 from zsxwing/update-without-agg.
    
    (cherry picked from commit bc6c56e940fe93591a1e5ba45751f1b243b57e28)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit 1022049c78e55914c54dff6d5206ad56dba7eef4
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-01-11T05:22:16Z

    [SPARK-19133][SPARKR][ML][BACKPORT-2.1] fix glm for Gamma, clarify glm family supported
    
    ## What changes were proposed in this pull request?
    
    backporting to 2.1, 2.0 and 1.6
    
    ## How was this patch tested?
    
    unit tests
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #16532 from felixcheung/rgammabackport.

commit 82fcc133040cb5ef32f10df73fcb9fd8914aa9c1
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-01-11T16:29:09Z

    [SPARK-19130][SPARKR] Support setting literal value as column implicitly
    
    ## What changes were proposed in this pull request?
    
    ```
    df$foo <- 1
    ```
    
    instead of
    ```
    df$foo <- lit(1)
    ```
    
    ## How was this patch tested?
    
    unit tests
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #16510 from felixcheung/rlitcol.
    
    (cherry picked from commit d749c06677c2fd383733337f1c00f542da122b8d)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 0b07634b5e06cc9030f20e277ec5956efff6c3fa
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-01-12T08:58:30Z

    [SPARK-19158][SPARKR][EXAMPLES] Fix ml.R example fails due to lack of e1071 package.
    
    ## What changes were proposed in this pull request?
    ```ml.R``` example depends on ```e1071``` package, if it's not available in users' environment, it will fail. I think the example should not depends on third-party packages, so I update it to remove the dependency.
    
    ## How was this patch tested?
    Manual test.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #16548 from yanboliang/spark-19158.
    
    (cherry picked from commit 2c586f506de9e2ba592afae1f0c73b6ae631bb96)
    Signed-off-by: Yanbo Liang <yb...@gmail.com>

commit 9b9867ef5b64b05f1e968de1fc0bfc1fcc64a707
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-01-10T13:27:55Z

    [SPARK-18857][SQL] Don't use `Iterator.duplicate` for `incrementalCollect` in Thrift Server
    
    ## What changes were proposed in this pull request?
    
    To support `FETCH_FIRST`, SPARK-16563 used Scala `Iterator.duplicate`. However,
    Scala `Iterator.duplicate` uses a **queue to buffer all items between both iterators**,
    this causes GC and hangs for queries with large number of rows. We should not use this,
    especially for `spark.sql.thriftServer.incrementalCollect`.
    
    https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/Iterator.scala#L1262-L1300
    
    ## How was this patch tested?
    
    Pass the existing tests.
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #16440 from dongjoon-hyun/SPARK-18857.
    
    (cherry picked from commit a2c6adcc5d2702d2f0e9b239517353335e5f911e)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 616a78a56cc911953e3133e60ab8c5a4fc287539
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-01-12T12:21:04Z

    [SPARK-18969][SQL] Support grouping by nondeterministic expressions
    
    ## What changes were proposed in this pull request?
    
    Currently nondeterministic expressions are allowed in `Aggregate`(see the [comment](https://github.com/apache/spark/blob/v2.0.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L249-L251)), but the `PullOutNondeterministic` analyzer rule failed to handle `Aggregate`, this PR fixes it.
    
    close https://github.com/apache/spark/pull/16379
    
    There is still one remaining issue: `SELECT a + rand() FROM t GROUP BY a + rand()` is not allowed, because the 2 `rand()` are different(we generate random seed as the default seed for `rand()`). https://issues.apache.org/jira/browse/SPARK-19035 is tracking this issue.
    
    ## How was this patch tested?
    
    a new test suite
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #16404 from cloud-fan/groupby.
    
    (cherry picked from commit 871d266649ddfed38c64dfda7158d8bb58d4b979)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 042e32d18ad10be5c60907959e55b0324df5b2c0
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-01-12T12:53:31Z

    [SPARK-19055][SQL][PYSPARK] Fix SparkSession initialization when SparkContext is stopped
    
    ## What changes were proposed in this pull request?
    
    In SparkSession initialization, we store created the instance of SparkSession into a class variable _instantiatedContext. Next time we can use SparkSession.builder.getOrCreate() to retrieve the existing SparkSession instance.
    
    However, when the active SparkContext is stopped and we create another new SparkContext to use, the existing SparkSession is still associated with the stopped SparkContext. So the operations with this existing SparkSession will be failed.
    
    We need to detect such case in SparkSession and renew the class variable _instantiatedContext if needed.
    
    ## How was this patch tested?
    
    New test added in PySpark.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    
    Closes #16454 from viirya/fix-pyspark-sparksession.
    
    (cherry picked from commit c6c37b8af714c8ddc8c77ac943a379f703558f27)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 23944d0d64a07d29e9bfcb8f8d6d22858ec02aef
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2017-01-12T17:46:53Z

    [SPARK-17237][SQL] Remove backticks in a pivot result schema
    
    ## What changes were proposed in this pull request?
    Pivoting adds backticks (e.g. 3_count(\`c\`)) in column names and, in some cases,
    thes causes analysis exceptions  like;
    ```
    scala> val df = Seq((2, 3, 4), (3, 4, 5)).toDF("a", "x", "y")
    scala> df.groupBy("a").pivot("x").agg(count("y"), avg("y")).na.fill(0)
    org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`y`)`;
      at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:134)
      at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:144)
    ...
    ```
    So, this pr proposes to remove these backticks from column names.
    
    ## How was this patch tested?
    Added a test in `DataFrameAggregateSuite`.
    
    Author: Takeshi YAMAMURO <li...@gmail.com>
    
    Closes #14812 from maropu/SPARK-17237.
    
    (cherry picked from commit 5585ed93b09bc05cdd7a731650eca50d43d7159b)
    Signed-off-by: gatorsmile <ga...@gmail.com>

commit 0668e061beba683d026a2d48011ff74faf8a38ab
Author: Andrew Ash <an...@andrewash.com>
Date:   2017-01-13T07:14:07Z

    Fix missing close-parens for In filter's toString
    
    Otherwise the open parentheses isn't closed in query plan descriptions of batch scans.
    
        PushedFilters: [In(COL_A, [1,2,4,6,10,16,219,815], IsNotNull(COL_B), ...
    
    Author: Andrew Ash <an...@andrewash.com>
    
    Closes #16558 from ash211/patch-9.
    
    (cherry picked from commit b040cef2ed0ed46c3dfb483a117200c9dac074ca)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit b2c9a2c8c8e8c38baa6d876c81d143af61328aa2
Author: Vinayak <vi...@in.ibm.com>
Date:   2017-01-13T10:35:12Z

    [SPARK-18687][PYSPARK][SQL] Backward compatibility - creating a Dataframe on a new SQLContext object fails with a Derby error
    
    Change is for SQLContext to reuse the active SparkSession during construction if the sparkContext supplied is the same as the currently active SparkContext. Without this change, a new SparkSession is instantiated that results in a Derby error when attempting to create a dataframe using a new SQLContext object even though the SparkContext supplied to the new SQLContext is same as the currently active one. Refer https://issues.apache.org/jira/browse/SPARK-18687 for details on the error and a repro.
    
    Existing unit tests and a new unit test added to pyspark-sql:
    
    /python/run-tests --python-executables=python --modules=pyspark-sql
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: Vinayak <vi...@in.ibm.com>
    Author: Vinayak Joshi <vi...@users.noreply.github.com>
    
    Closes #16119 from vijoshi/SPARK-18687_master.
    
    (cherry picked from commit 285a7798e267311730b0163d37d726a81465468a)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 2c2ca8943c4355af491ec19fe6d13949182260ab
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-01-13T06:52:34Z

    [SPARK-19178][SQL] convert string of large numbers to int should return null
    
    ## What changes were proposed in this pull request?
    
    When we convert a string to integral, we will convert that string to `decimal(20, 0)` first, so that we can turn a string with decimal format to truncated integral, e.g. `CAST('1.2' AS int)` will return `1`.
    
    However, this brings problems when we convert a string with large numbers to integral, e.g. `CAST('1234567890123' AS int)` will return `1912276171`, while Hive returns null as we expected.
    
    This is a long standing bug(seems it was there the first day Spark SQL was created), this PR fixes this bug by adding the native support to convert `UTF8String` to integral.
    
    ## How was this patch tested?
    
    new regression tests
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #16550 from cloud-fan/string-to-int.
    
    (cherry picked from commit 6b34e745bb8bdcf5a8bb78359fa39bbe8c6563cc)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit ee3642f5182f199aac15b69d1a6a1167f75e5c65
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-01-13T18:08:14Z

    [SPARK-18335][SPARKR] createDataFrame to support numPartitions parameter
    
    ## What changes were proposed in this pull request?
    
    To allow specifying number of partitions when the DataFrame is created
    
    ## How was this patch tested?
    
    manual, unit tests
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #16512 from felixcheung/rnumpart.
    
    (cherry picked from commit b0e8eb6d3e9e80fa62625a5b9382d93af77250db)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17837: Sync with upstream 2.1

Posted by superbobry <gi...@git.apache.org>.

Github user superbobry closed the pull request at:

    https://github.com/apache/spark/pull/17837


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17837: Sync with upstream 2.1

Posted by superbobry <gi...@git.apache.org>.

Github user superbobry commented on the issue:

    https://github.com/apache/spark/pull/17837
  
    Sorry, opened by accident.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org