You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by fjh100456 <gi...@git.apache.org> on 2017/05/08 02:48:43 UTC

[GitHub] spark pull request #17895: Branch 2.0

GitHub user fjh100456 opened a pull request:

    https://github.com/apache/spark/pull/17895

    Branch 2.0

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17895
    
----
commit b57e2acb134d94dafc81686da875c5dd3ea35c74
Author: Jagadeesan <as...@us.ibm.com>
Date:   2016-10-03T09:46:38Z

    [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,…
    
    ## What changes were proposed in this pull request?
    
    To build R docs (which are built when R tests are run), users need to install pandoc and rmarkdown. This was done for Jenkins in ~~[SPARK-17420](https://issues.apache.org/jira/browse/SPARK-17420)~~
    
    … pandoc]
    
    Author: Jagadeesan <as...@us.ibm.com>
    
    Closes #15309 from jagadeesanas2/SPARK-17736.
    
    (cherry picked from commit a27033c0bbaae8f31db9b91693947ed71738ed11)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 613863b116b6cbc9ac83845c68a2d11b3b02f7cb
Author: zero323 <ze...@users.noreply.github.com>
Date:   2016-10-04T00:57:54Z

    [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract
    
    ## What changes were proposed in this pull request?
    
    Replaces` ValueError` with `IndexError` when index passed to `ml` / `mllib` `SparseVector.__getitem__` is out of range. This ensures correct iteration behavior.
    
    Replaces `ValueError` with `IndexError` for `DenseMatrix` and `SparkMatrix` in `ml` / `mllib`.
    
    ## How was this patch tested?
    
    PySpark `ml` / `mllib` unit tests. Additional unit tests to prove that the problem has been resolved.
    
    Author: zero323 <ze...@users.noreply.github.com>
    
    Closes #15144 from zero323/SPARK-17587.
    
    (cherry picked from commit d8399b600cef706c22d381b01fab19c610db439a)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit 5843932021cc8bbe0277943c6c480cfeae1b29e2
Author: Herman van Hovell <hv...@databricks.com>
Date:   2016-10-04T02:32:59Z

    [SPARK-17753][SQL] Allow a complex expression as the input a value based case statement
    
    ## What changes were proposed in this pull request?
    We currently only allow relatively simple expressions as the input for a value based case statement. Expressions like `case (a > 1) or (b = 2) when true then 1 when false then 0 end` currently fail. This PR adds support for such expressions.
    
    ## How was this patch tested?
    Added a test to the ExpressionParserSuite.
    
    Author: Herman van Hovell <hv...@databricks.com>
    
    Closes #15322 from hvanhovell/SPARK-17753.
    
    (cherry picked from commit 2bbecdec2023143fd144e4242ff70822e0823986)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit 7429199e5b34d5594e3fcedb57eda789d16e26f3
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-10-04T04:28:16Z

    [SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver
    
    ## What changes were proposed in this pull request?
    
    Currently, Spark Thrift Server raises `IllegalArgumentException` for queries whose column types are `NullType`, e.g., `SELECT null` or `SELECT if(true,null,null)`. This PR fixes that by returning `void` like Hive 1.2.
    
    **Before**
    ```sql
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
    Driver: Hive JDBC (version 1.2.1.spark2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
    Closing: 0: jdbc:hive2://localhost:10000
    
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
    Driver: Hive JDBC (version 1.2.1.spark2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
    Closing: 0: jdbc:hive2://localhost:10000
    ```
    
    **After**
    ```sql
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
    Driver: Hive JDBC (version 1.2.1.spark2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    +-------+--+
    | NULL  |
    +-------+--+
    | NULL  |
    +-------+--+
    1 row selected (3.242 seconds)
    Beeline version 1.2.1.spark2 by Apache Hive
    Closing: 0: jdbc:hive2://localhost:10000
    
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
    Driver: Hive JDBC (version 1.2.1.spark2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    +-------------------------+--+
    | (IF(true, NULL, NULL))  |
    +-------------------------+--+
    | NULL                    |
    +-------------------------+--+
    1 row selected (0.201 seconds)
    Beeline version 1.2.1.spark2 by Apache Hive
    Closing: 0: jdbc:hive2://localhost:10000
    ```
    
    ## How was this patch tested?
    
    * Pass the Jenkins test with a new testsuite.
    * Also, Manually, after starting Spark Thrift Server, run the following command.
    ```sql
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
    ```
    
    **Hive 1.2**
    ```sql
    hive> create table null_table as select null;
    hive> desc null_table;
    OK
    _c0                     void
    ```
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #15325 from dongjoon-hyun/SPARK-17112.
    
    (cherry picked from commit c571cfb2d0e1e224107fc3f0c672730cae9804cb)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 3dbe8097facb854195729da7bd577f6c14eb2b2a
Author: ding <di...@localhost.localdomain>
Date:   2016-10-04T07:00:10Z

    [SPARK-17559][MLLIB] persist edges if their storage level is non in PeriodicGraphCheckpointer
    
    ## What changes were proposed in this pull request?
    When use PeriodicGraphCheckpointer to persist graph, sometimes the edges isn't persisted. As currently only when vertices's storage level is none, graph is persisted. However there is a chance vertices's storage level is not none while edges's is none. Eg. graph created by a outerJoinVertices operation, vertices is automatically cached while edges is not. In this way, edges will not be persisted if we use PeriodicGraphCheckpointer do persist. We need separately check edges's storage level and persisted it if it's none.
    
    ## How was this patch tested?
     manual tests
    
    Author: ding <di...@localhost.localdomain>
    
    Closes #15124 from dding3/spark-persisitEdge.
    
    (cherry picked from commit 126baa8d32bc0e7bf8b43f9efa84f2728f02347d)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit 50f6be7598547fed5190a920fd3cebb4bc908524
Author: Felix Cheung <fe...@hotmail.com>
Date:   2016-10-04T16:22:26Z

    [SPARKR][DOC] minor formatting and output cleanup for R vignettes
    
    Clean up output, format table, truncate long example output, hide warnings
    
    (new - Left; existing - Right)
    ![image](https://cloud.githubusercontent.com/assets/8969467/19064018/5dcde4d0-89bc-11e6-857b-052df3f52a4e.png)
    
    ![image](https://cloud.githubusercontent.com/assets/8969467/19064034/6db09956-89bc-11e6-8e43-232d5c3fe5e6.png)
    
    ![image](https://cloud.githubusercontent.com/assets/8969467/19064058/88f09590-89bc-11e6-9993-61639e29dfdd.png)
    
    ![image](https://cloud.githubusercontent.com/assets/8969467/19064066/95ccbf64-89bc-11e6-877f-45af03ddcadc.png)
    
    ![image](https://cloud.githubusercontent.com/assets/8969467/19064082/a8445404-89bc-11e6-8532-26d8bc9b206f.png)
    
    Run create-doc.sh manually
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #15340 from felixcheung/vignettes.
    
    (cherry picked from commit 068c198e956346b90968a4d74edb7bc820c4be28)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit a9165bb1b704483ad16331945b0968cbb1a97139
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2016-10-04T16:38:44Z

    [SPARK-17549][SQL] Only collect table size stat in driver for cached relation.
    
    This reverts commit 9ac68dbc5720026ea92acc61d295ca64d0d3d132. Turns out
    the original fix was correct.
    
    Original change description:
    The existing code caches all stats for all columns for each partition
    in the driver; for a large relation, this causes extreme memory usage,
    which leads to gc hell and application failures.
    
    It seems that only the size in bytes of the data is actually used in the
    driver, so instead just colllect that. In executors, the full stats are
    still kept, but that's not a big problem; we expect the data to be distributed
    and thus not really incur in too much memory pressure in each individual
    executor.
    
    There are also potential improvements on the executor side, since the data
    being stored currently is very wasteful (e.g. storing boxed types vs.
    primitive types for stats). But that's a separate issue.
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #15304 from vanzin/SPARK-17549.2.
    
    (cherry picked from commit 8d969a2125d915da1506c17833aa98da614a257f)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit a4f7df423e1e0aa512dfc496bc9de13831eae3f3
Author: Ergin Seyfe <es...@fb.com>
Date:   2016-10-04T19:39:01Z

    [SPARK-17773][BRANCH-2.0] Input/Output] Add VoidObjectInspector
    
    This is the PR for branch2.0: PR https://github.com/apache/spark/pull/15337
    
    Added VoidObjectInspector to the list of PrimitiveObjectInspectors
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    Executing following query was failing.
    select SOME_UDAF*(a.arr)
    from (
    select Array(null) as arr from dim_one_row
    ) a
    
    After the fix, I am getting the correct output:
    res0: Array[org.apache.spark.sql.Row] = Array([null])
    
    Author: Ergin Seyfe <eseyfefb.com>
    
    Closes #15337 from seyfe/add_void_object_inspector.
    
    Author: Ergin Seyfe <es...@fb.com>
    
    Closes #15345 from seyfe/add_void_object_inspector_2.0.

commit b8df2e53c38a30f51c710543c81279a59a9ab4fc
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-10-05T21:54:55Z

    [SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of BlockManagerSuite
    
    ## What changes were proposed in this pull request?
    
    Mock SparkContext to reduce memory usage of BlockManagerSuite
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #15350 from zsxwing/SPARK-17778.
    
    (cherry picked from commit 221b418b1c9db7b04c600b6300d18b034a4f444e)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit 3b6463a794a754d630d69398f009c055664dd905
Author: Herman van Hovell <hv...@databricks.com>
Date:   2016-10-05T23:05:30Z

    [SPARK-17758][SQL] Last returns wrong result in case of empty partition
    
    ## What changes were proposed in this pull request?
    The result of the `Last` function can be wrong when the last partition processed is empty. It can return `null` instead of the expected value. For example, this can happen when we process partitions in the following order:
    ```
    - Partition 1 [Row1, Row2]
    - Partition 2 [Row3]
    - Partition 3 []
    ```
    In this case the `Last` function will currently return a null, instead of the value of `Row3`.
    
    This PR fixes this by adding a `valueSet` flag to the `Last` function.
    
    ## How was this patch tested?
    We only used end to end tests for `DeclarativeAggregateFunction`s. I have added an evaluator for these functions so we can tests them in catalyst. I have added a `LastTestSuite` to test the `Last` aggregate function.
    
    Author: Herman van Hovell <hv...@databricks.com>
    
    Closes #15348 from hvanhovell/SPARK-17758.
    
    (cherry picked from commit 5fd54b994e2078dbf0794932b4e0ffa9a9eda0c3)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 1c2dff1eeeb045f3f5c3c1423ba07371b03965d7
Author: Michael Armbrust <mi...@databricks.com>
Date:   2016-10-05T23:48:43Z

    [SPARK-17643] Remove comparable requirement from Offset (backport for branch-2.0)
    
    ## What changes were proposed in this pull request?
    
    Backport https://github.com/apache/spark/commit/988c71457354b0a443471f501cef544a85b1a76a to branch-2.0
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #15362 from zsxwing/SPARK-17643-2.0.

commit 225372adfb843afcbf9928db3989f2f8393ae6d8
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-10-06T17:33:45Z

    [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming
    
    ## What changes were proposed in this pull request?
    I was looking through API annotations to catch mislabeled APIs, and realized DataStreamReader and DataStreamWriter classes are already annotated as Experimental, and as a result there is no need to annotate each method within them.
    
    ## How was this patch tested?
    N/A
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #15373 from rxin/SPARK-17798.
    
    (cherry picked from commit 79accf45ace5549caa0cbab02f94fc87bedb5587)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit a2bf09588ed98ef33028fcf4d72c15f06af2e9ad
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-10-06T19:51:12Z

    [SPARK-17780][SQL] Report Throwable to user in StreamExecution
    
    ## What changes were proposed in this pull request?
    
    When using an incompatible source for structured streaming, it may throw NoClassDefFoundError. It's better to just catch Throwable and report it to the user since the streaming thread is dying.
    
    ## How was this patch tested?
    
    `test("NoClassDefFoundError from an incompatible source")`
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #15352 from zsxwing/SPARK-17780.
    
    (cherry picked from commit 9a48e60e6319d85f2c3be3a3c608dab135e18a73)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit e355ca8e828629455228b6a346d64638ab639cfa
Author: Christian Kadner <ck...@us.ibm.com>
Date:   2016-10-06T21:28:49Z

    [SPARK-17803][TESTS] Upgrade docker-client dependency
    
    [SPARK-17803: Docker integration tests don't run with "Docker for Mac"](https://issues.apache.org/jira/browse/SPARK-17803)
    
    ## What changes were proposed in this pull request?
    
    This PR upgrades the [docker-client](https://mvnrepository.com/artifact/com.spotify/docker-client) dependency from [3.6.6](https://mvnrepository.com/artifact/com.spotify/docker-client/3.6.6) to [5.0.2](https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2) to enable _Docker for Mac_ users to run the `docker-integration-tests` out of the box.
    
    The very latest docker-client version is [6.0.0](https://mvnrepository.com/artifact/com.spotify/docker-client/6.0.0) but that has one additional dependency and no usage yet.
    
    ## How was this patch tested?
    
    The code change was tested on Mac OS X Yosemite with both _Docker Toolbox_ as well as _Docker for Mac_ and on Linux Ubuntu 14.04.
    
    ```
    $ build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests clean package
    
    $ build/mvn -Pdocker-integration-tests -Pscala-2.11 -pl :spark-docker-integration-tests_2.11 clean compile test
    ```
    
    Author: Christian Kadner <ck...@us.ibm.com>
    
    Closes #15378 from ckadner/SPARK-17803_Docker_for_Mac.
    
    (cherry picked from commit 49d11d49983fbe270f4df4fb1e34b5fbe854c5ec)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit b1a9c41e8c41c90dd15ee6f635356dd1a5bbf395
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-10-06T23:09:45Z

    [SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with INTERVAL arithmetic
    
    ## What changes were proposed in this pull request?
    
    Currently, Spark raises `RuntimeException` when creating a view with timestamp with INTERVAL arithmetic like the following. The root cause is the arithmetic expression, `TimeAdd`, was transformed into `timeadd` function as a VIEW definition. This PR fixes the SQL definition of `TimeAdd` and `TimeSub` expressions.
    
    ```scala
    scala> sql("CREATE TABLE dates (ts TIMESTAMP)")
    
    scala> sql("CREATE VIEW view1 AS SELECT ts + INTERVAL 1 DAY FROM dates")
    java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ...
    ```
    
    ## How was this patch tested?
    
    Pass Jenkins with a new testcase.
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #15383 from dongjoon-hyun/SPARK-17750-BACK.

commit 594a2cf6f7c74c54127b8c3947aadbe0052b404c
Author: sethah <se...@gmail.com>
Date:   2016-10-07T04:10:17Z

    [SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types
    
    ## What changes were proposed in this pull request?
    
    Before, we computed `instances` in LinearRegression in two spots, even though they did the same thing. One of them did not cast the label column to `DoubleType`. This patch consolidates the computation and always casts the label column to `DoubleType`.
    
    ## How was this patch tested?
    
    Added a unit test to check all solvers. This test failed before this patch.
    
    Author: sethah <se...@gmail.com>
    
    Closes #15364 from sethah/linreg_numeric_type.
    
    (cherry picked from commit 3713bb199142c5e06e2e527c99650f02f41f47b1)
    Signed-off-by: Yanbo Liang <yb...@gmail.com>

commit 380b099fcfe6f70b978300ea208faf630855471a
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-10-07T05:27:20Z

    [SPARK-17612][SQL][BRANCH-2.0] Support `DESCRIBE table PARTITION` SQL syntax
    
    ## What changes were proposed in this pull request?
    
    This is a backport of SPARK-17612. This implements `DESCRIBE table PARTITION` SQL Syntax again. It was supported until Spark 1.6.2, but was dropped since 2.0.0.
    
    **Spark 1.6.2**
    ```scala
    scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)")
    res1: org.apache.spark.sql.DataFrame = [result: string]
    
    scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
    res2: org.apache.spark.sql.DataFrame = [result: string]
    
    scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false)
    +----------------------------------------------------------------+
    |result                                                          |
    +----------------------------------------------------------------+
    |a                      string                                   |
    |b                      int                                      |
    |c                      string                                   |
    |d                      string                                   |
    |                                                                |
    |# Partition Information                                         |
    |# col_name             data_type               comment          |
    |                                                                |
    |c                      string                                   |
    |d                      string                                   |
    +----------------------------------------------------------------+
    ```
    
    **Spark 2.0**
    - **Before**
    ```scala
    scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)")
    res0: org.apache.spark.sql.DataFrame = []
    
    scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
    res1: org.apache.spark.sql.DataFrame = []
    
    scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false)
    org.apache.spark.sql.catalyst.parser.ParseException:
    Unsupported SQL statement
    ```
    
    - **After**
    ```scala
    scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)")
    res0: org.apache.spark.sql.DataFrame = []
    
    scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
    res1: org.apache.spark.sql.DataFrame = []
    
    scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false)
    +-----------------------+---------+-------+
    |col_name               |data_type|comment|
    +-----------------------+---------+-------+
    |a                      |string   |null   |
    |b                      |int      |null   |
    |c                      |string   |null   |
    |d                      |string   |null   |
    |# Partition Information|         |       |
    |# col_name             |data_type|comment|
    |c                      |string   |null   |
    |d                      |string   |null   |
    +-----------------------+---------+-------+
    
    scala> sql("DESC EXTENDED partitioned_table PARTITION (c='Us', d=1)").show(100,false)
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+
    |col_name                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |data_type|comment|
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+
    |a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |string   |null   |
    |b                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |int      |null   |
    |c                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |string   |null   |
    |d                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |string   |null   |
    |# Partition Information                                                                                                                                                                                                                                                                                                                                                                                                                                                            |         |       |
    |# col_name                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |data_type|comment|
    |c                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |string   |null   |
    |d                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |string   |null   |
    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |         |       |
    |Detailed Partition Information CatalogPartition(
            Partition Values: [Us, 1]
            Storage(Location: file:/Users/dhyun/SPARK-17612-DESC-PARTITION/spark-warehouse/partitioned_table/c=Us/d=1, InputFormat: org.apache.hadoop.mapred.TextInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, Serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Properties: [serialization.format=1])
            Partition Parameters:{transient_lastDdlTime=1475001066})|         |       |
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+
    
    scala> sql("DESC FORMATTED partitioned_table PARTITION (c='Us', d=1)").show(100,false)
    +--------------------------------+---------------------------------------------------------------------------------------+-------+
    |col_name                        |data_type                                                                              |comment|
    +--------------------------------+---------------------------------------------------------------------------------------+-------+
    |a                               |string                                                                                 |null   |
    |b                               |int                                                                                    |null   |
    |c                               |string                                                                                 |null   |
    |d                               |string                                                                                 |null   |
    |# Partition Information         |                                                                                       |       |
    |# col_name                      |data_type                                                                              |comment|
    |c                               |string                                                                                 |null   |
    |d                               |string                                                                                 |null   |
    |                                |                                                                                       |       |
    |# Detailed Partition Information|                                                                                       |       |
    |Partition Value:                |[Us, 1]                                                                                |       |
    |Database:                       |default                                                                                |       |
    |Table:                          |partitioned_table                                                                      |       |
    |Location:                       |file:/Users/dhyun/SPARK-17612-DESC-PARTITION/spark-warehouse/partitioned_table/c=Us/d=1|       |
    |Partition Parameters:           |                                                                                       |       |
    |  transient_lastDdlTime         |1475001066                                                                             |       |
    |                                |                                                                                       |       |
    |# Storage Information           |                                                                                       |       |
    |SerDe Library:                  |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                     |       |
    |InputFormat:                    |org.apache.hadoop.mapred.TextInputFormat                                               |       |
    |OutputFormat:                   |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                             |       |
    |Compressed:                     |No                                                                                     |       |
    |Storage Desc Parameters:        |                                                                                       |       |
    |  serialization.format          |1                                                                                      |       |
    +--------------------------------+---------------------------------------------------------------------------------------+-------+
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins tests with a new testcase.
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #15351 from dongjoon-hyun/SPARK-17612-BACK.

commit 3487b020354988a91181f23b1c6711bfcdb4c529
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-10-07T07:27:55Z

    [SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths
    
    ## What changes were proposed in this pull request?
    If given a list of paths, `pyspark.sql.readwriter.text` will attempt to use an undefined variable `paths`.  This change checks if the param `paths` is a basestring and then converts it to a list, so that the same variable `paths` can be used for both cases
    
    ## How was this patch tested?
    Added unit test for reading list of files
    
    Author: Bryan Cutler <cu...@gmail.com>
    
    Closes #15379 from BryanCutler/sql-readtext-paths-SPARK-17805.
    
    (cherry picked from commit bcaa799cb01289f73e9f48526e94653a07628983)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 9f2eb27a425385836dba5aad61babfb1db738a73
Author: Sean Owen <so...@cloudera.com>
Date:   2016-10-07T17:31:41Z

    [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished
    
    This expands calls to Jetty's simple `ServerConnector` constructor to explicitly specify a `ScheduledExecutorScheduler` that makes daemon threads. It should otherwise result in exactly the same configuration, because the other args are copied from the constructor that is currently called.
    
    (I'm not sure we should change the Hive Thriftserver impl, but I did anyway.)
    
    This also adds `sc.stop()` to the quick start guide example.
    
    Existing tests; _pending_ at least manual verification of the fix.
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #15381 from srowen/SPARK-17707.
    
    (cherry picked from commit cff560755244dd4ccb998e0c56e81d2620cd4cff)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit f460a199e8fc78ce879b79844c6c9e340b574439
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-10-07T18:32:39Z

    [SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0)
    
    ## What changes were proposed in this pull request?
    
    Backport https://github.com/apache/spark/commit/9293734d35eb3d6e4fd4ebb86f54dd5d3a35e6db and https://github.com/apache/spark/commit/b678e465afa417780b54db0fbbaa311621311f15 into branch 2.0.
    
    The only difference is the Spark version in pom file.
    
    ## How was this patch tested?
    
    Jenkins.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #15367 from zsxwing/kafka-source-branch-2.0.

commit a84d8ef375f853c5841d458a593e41b457b9e6ff
Author: Herman van Hovell <hv...@databricks.com>
Date:   2016-10-07T10:46:39Z

    [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules
    
    ## What changes were proposed in this pull request?
    This PR adds the Kafka 0.10 subproject to the build infrastructure. This makes sure Kafka 0.10 tests are only triggers when it or of its dependencies change.
    
    Author: Herman van Hovell <hv...@databricks.com>
    
    Closes #15355 from hvanhovell/SPARK-17782.

commit 6d056c168c45d2decf5ffbb96d59623d52ed8490
Author: Davies Liu <da...@databricks.com>
Date:   2016-10-07T22:03:47Z

    [SPARK-17806] [SQL] fix bug in join key rewritten in HashJoin
    
    ## What changes were proposed in this pull request?
    
    In HashJoin, we try to rewrite the join key as Long to improve the performance of finding a match. The rewriting part is not well tested, has a bug that could cause wrong result when there are at least three integral columns in the joining key also the total length of the key exceed 8 bytes.
    
    ## How was this patch tested?
    
    Added unit test to covering the rewriting with different number of columns and different data types. Manually test the reported case and confirmed that this PR fix the bug.
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #15390 from davies/rewrite_key.
    
    (cherry picked from commit 94b24b84a666517e31e9c9d693f92d9bbfd7f9ad)
    Signed-off-by: Davies Liu <da...@gmail.com>

commit d27df35795fac0fd167e51d5ba08092a17eedfc2
Author: jiangxingbo <ji...@gmail.com>
Date:   2016-10-10T04:52:46Z

    [SPARK-17832][SQL] TableIdentifier.quotedString creates un-parseable names when name contains a backtick
    
    ## What changes were proposed in this pull request?
    
    The `quotedString` method in `TableIdentifier` and `FunctionIdentifier` produce an illegal (un-parseable) name when the name contains a backtick. For example:
    ```
    import org.apache.spark.sql.catalyst.parser.CatalystSqlParser._
    import org.apache.spark.sql.catalyst.TableIdentifier
    import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
    val complexName = TableIdentifier("`weird`table`name", Some("`d`b`1"))
    parseTableIdentifier(complexName.unquotedString) // Does not work
    parseTableIdentifier(complexName.quotedString) // Does not work
    parseExpression(complexName.unquotedString) // Does not work
    parseExpression(complexName.quotedString) // Does not work
    ```
    We should handle the backtick properly to make `quotedString` parseable.
    
    ## How was this patch tested?
    Add new testcases in `TableIdentifierParserSuite` and `ExpressionParserSuite`.
    
    Author: jiangxingbo <ji...@gmail.com>
    
    Closes #15403 from jiangxb1987/backtick.
    
    (cherry picked from commit 26fbca480604ba258f97b9590cfd6dda1ecd31db)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit d719e9a080a909a6a56db938750d553668743f8f
Author: Dhruve Ashar <dh...@gmail.com>
Date:   2016-10-10T15:55:57Z

    [SPARK-17417][CORE] Fix # of partitions for Reliable RDD checkpointing
    
    ## What changes were proposed in this pull request?
    Currently the no. of partition files are limited to 10000 files (%05d format). If there are more than 10000 part files, the logic goes for a toss while recreating the RDD as it sorts them by string. More details can be found in the JIRA desc [here](https://issues.apache.org/jira/browse/SPARK-17417).
    
    ## How was this patch tested?
    I tested this patch by checkpointing a RDD and then manually renaming part files to the old format and tried to access the RDD. It was successfully created from the old format. Also verified loading a sample parquet file and saving it as multiple formats - CSV, JSON, Text, Parquet, ORC and read them successfully back from the saved files. I couldn't launch the unit test from my local box, so will wait for the Jenkins output.
    
    Author: Dhruve Ashar <dh...@gmail.com>
    
    Closes #15370 from dhruve/bug/SPARK-17417.
    
    (cherry picked from commit 4bafacaa5f50a3e986c14a38bc8df9bae303f3a0)
    Signed-off-by: Tom Graves <tg...@yahoo-inc.com>

commit ff9f5bbf1795d9f5b14838099dcc1bb4ac8a9b5b
Author: Davies Liu <da...@databricks.com>
Date:   2016-10-11T02:14:01Z

    [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite
    
    ## What changes were proposed in this pull request?
    
    The default buffer size is not big enough for randomly generated MapType.
    
    ## How was this patch tested?
    
    Ran the tests in 100 times, it never fail (it fail 8 times before the patch).
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #15395 from davies/flaky_map.
    
    (cherry picked from commit d5ec4a3e014494a3d991a6350caffbc3b17be0fd)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit a6b5e1dccf0be0e709d6d4113cdacb0cecce39fd
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-10-11T17:53:07Z

    [SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite
    
    ## What changes were proposed in this pull request?
    
    A follow up Pr for SPARK-17346 to fix flaky `org.apache.spark.sql.kafka010.KafkaSourceStressSuite`.
    
    Test log: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1855/testReport/junit/org.apache.spark.sql.kafka010/KafkaSourceStressSuite/_It_is_not_a_test_/
    
    Looks like deleting the Kafka internal topic `__consumer_offsets` is flaky. This PR just simply ignores internal topics.
    
    ## How was this patch tested?
    
    Existing tests.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #15384 from zsxwing/SPARK-17346-flaky-test.
    
    (cherry picked from commit 75b9e351413dca0930e8545e6283874db09d8482)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit 5ec3e6680a091883369c002ae599d6b03f38c863
Author: Ergin Seyfe <es...@fb.com>
Date:   2016-10-11T19:51:08Z

    [SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator
    
    ## What changes were proposed in this pull request?
    Replaced `BlockStatusesAccumulator` with `CollectionAccumulator` which is thread safe and few more cleanups.
    
    ## How was this patch tested?
    Tested in master branch and cherry-picked.
    
    Author: Ergin Seyfe <es...@fb.com>
    
    Closes #15425 from seyfe/race_cond_jsonprotocal_branch-2.0.

commit e68e95e947045704d3e6a36bb31e104a99d3adcc
Author: Alexander Pivovarov <ap...@gmail.com>
Date:   2016-10-12T05:31:21Z

    Fix hadoop.version in building-spark.md
    
    Couple of mvn build examples use `-Dhadoop.version=VERSION` instead of actual version number
    
    Author: Alexander Pivovarov <ap...@gmail.com>
    
    Closes #15440 from apivovarov/patch-1.
    
    (cherry picked from commit 299eb04ba05038c7dbb3ecf74a35d4bbfa456643)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit f3d82b53c42a971deedc04de6950b9228e5262ea
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2016-10-12T05:36:57Z

    [SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document is incorrect.
    
    ## What changes were proposed in this pull request?
    
    In `programming-guide.md`, the url which links to `AccumulatorV2` says `api/scala/index.html#org.apache.spark.AccumulatorV2` but `api/scala/index.html#org.apache.spark.util.AccumulatorV2` is correct.
    
    ## How was this patch tested?
    manual test.
    
    Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
    
    Closes #15439 from sarutak/SPARK-17880.
    
    (cherry picked from commit b512f04f8e546843d5a3f35dcc6b675b5f4f5bc0)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit f12b74c02eec9e201fec8a16dac1f8e549c1b4f0
Author: cody koeninger <co...@koeninger.org>
Date:   2016-10-12T07:40:47Z

    [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad
    
    ## What changes were proposed in this pull request?
    
    Documentation fix to make it clear that reusing group id for different streams is super duper bad, just like it is with the underlying Kafka consumer.
    
    ## How was this patch tested?
    
    I built jekyll doc and made sure it looked ok.
    
    Author: cody koeninger <co...@koeninger.org>
    
    Closes #15442 from koeninger/SPARK-17853.
    
    (cherry picked from commit c264ef9b1918256a5018c7a42a1a2b42308ea3f7)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17895: Branch 2.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17895
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17895: Branch 2.0

Posted by fjh100456 <gi...@git.apache.org>.

Github user fjh100456 commented on the issue:

    https://github.com/apache/spark/pull/17895
  
    Sorry, I made a mistake. I'll close it by now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17895: Branch 2.0

Posted by fjh100456 <gi...@git.apache.org>.

Github user fjh100456 closed the pull request at:

    https://github.com/apache/spark/pull/17895


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17895: Branch 2.0

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17895
  
    @fjh100456 looks mistakenly open. Could you close this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org