You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by ahnqirage <gi...@git.apache.org> on 2016/05/13 02:01:06 UTC

[GitHub] spark pull request: Branch 2.0

GitHub user ahnqirage opened a pull request:

    https://github.com/apache/spark/pull/13089

    Branch 2.0

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13089
    
----
commit 7051722023b98f1720142c7b3b41948d275ea455
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-05-02T02:05:20Z

    [SPARK-13425][SQL] Documentation for CSV datasource options
    
    ## What changes were proposed in this pull request?
    
    This PR adds the explanation and documentation for CSV options for reading and writing.
    
    ## How was this patch tested?
    
    Style tests with `./dev/run_tests` for documentation style.
    
    Author: hyukjinkwon <gu...@gmail.com>
    Author: Hyukjin Kwon <gu...@gmail.com>
    
    Closes #12817 from HyukjinKwon/SPARK-13425.
    
    (cherry picked from commit a832cef11233c6357c7ba7ede387b432e6b0ed71)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 7d63c36e1efe8baec96cdc16a997249728e204fd
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-05-02T03:21:02Z

    [SPARK-15049] Rename NewAccumulator to AccumulatorV2
    
    ## What changes were proposed in this pull request?
    NewAccumulator isn't the best name if we ever come up with v3 of the API.
    
    ## How was this patch tested?
    Updated tests to reflect the change.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #12827 from rxin/SPARK-15049.
    
    (cherry picked from commit 44da8d8eabeccc12bfed0d43b37d54e0da845c66)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit ccb53a20e4c3bff02a17542cad13a1fe36d7a7ea
Author: Ben McCann <be...@gmail.com>
Date:   2016-05-02T05:43:28Z

    Fix reference to external metrics documentation
    
    Author: Ben McCann <be...@gmail.com>
    
    Closes #12833 from benmccann/patch-1.
    
    (cherry picked from commit 214d1be4fd4a34399b6a2adb2618784de459a48d)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 1145ea01b994faf458ab00301b8c4ac757d0dd0b
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-05-02T17:21:14Z

    [SPARK-14637][SQL] object expressions cleanup
    
    ## What changes were proposed in this pull request?
    
    Simplify and clean up some object expressions:
    
    1. simplify the logic to handle `propagateNull`
    2. add `propagateNull` parameter to `Invoke`
    3. simplify the unbox logic in `Invoke`
    4. other minor cleanup
    
    TODO: simplify `MapObjects`
    
    ## How was this patch tested?
    
    existing tests.
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #12399 from cloud-fan/object.
    
    (cherry picked from commit 0513c3ac93e0a25d6eedbafe6c0561e71c92880a)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit eb7336a754574879fc28c3c10cb68651329a346d
Author: Jeff Zhang <zj...@apache.org>
Date:   2016-05-02T18:03:37Z

    [SPARK-14845][SPARK_SUBMIT][YARN] spark.files in properties file is n…
    
    ## What changes were proposed in this pull request?
    
    initialize SparkSubmitArgument#files first from spark-submit arguments then from properties file, so that sys property spark.yarn.dist.files will be set correctly.
    ```
    OptionAssigner(args.files, YARN, ALL_DEPLOY_MODES, sysProp = "spark.yarn.dist.files"),
    ```
    ## How was this patch tested?
    
    manul test. file defined in properties file is also distributed to driver in yarn-cluster mode.
    
    Author: Jeff Zhang <zj...@apache.org>
    
    Closes #12656 from zjffdu/SPARK-14845.
    
    (cherry picked from commit 0a3026990bd0cbad53f0001da793349201104958)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit 08ae32e6104e998b3c9a4822e563e63aeae55578
Author: Andrew Ray <ra...@gmail.com>
Date:   2016-05-02T18:12:55Z

    [SPARK-13749][SQL] Faster pivot implementation for many distinct values with two phase aggregation
    
    ## What changes were proposed in this pull request?
    
    The existing implementation of pivot translates into a single aggregation with one aggregate per distinct pivot value. When the number of distinct pivot values is large (say 1000+) this can get extremely slow since each input value gets evaluated on every aggregate even though it only affects the value of one of them.
    
    I'm proposing an alternate strategy for when there are 10+ (somewhat arbitrary threshold) distinct pivot values. We do two phases of aggregation. In the first we group by the grouping columns plus the pivot column and perform the specified aggregations (one or sometimes more). In the second aggregation we group by the grouping columns and use the new (non public) PivotFirst aggregate that rearranges the outputs of the first aggregation into an array indexed by the pivot value. Finally we do a project to extract the array entries into the appropriate output column.
    
    ## How was this patch tested?
    
    Additional unit tests in DataFramePivotSuite and manual larger scale testing.
    
    Author: Andrew Ray <ra...@gmail.com>
    
    Closes #11583 from aray/fast-pivot.
    
    (cherry picked from commit 99274418684ebae5b98d15b4686b95c1ac029e94)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 1c2082b643dc01fdeb2405c97dcf5a9551cc782d
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-05-02T18:28:21Z

    [SPARK-14579][SQL] Fix the race condition in StreamExecution.processAllAvailable again
    
    ## What changes were proposed in this pull request?
    
    #12339 didn't fix the race condition. MemorySinkSuite is still flaky: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.2/814/testReport/junit/org.apache.spark.sql.streaming/MemorySinkSuite/registering_as_a_table/
    
    Here is an execution order to reproduce it.
    
    | Time        |Thread 1           | MicroBatchThread  |
    |:-------------:|:-------------:|:-----:|
    | 1 | |  `MemorySink.getOffset` |
    | 2 | |  availableOffsets ++= newData (availableOffsets is not changed here)  |
    | 3 | addData(newData)      |   |
    | 4 | Set `noNewData` to `false` in  processAllAvailable |  |
    | 5 | | `dataAvailable` returns `false`   |
    | 6 | | noNewData = true |
    | 7 | `noNewData` is true so just return | |
    | 8 |  assert results and fail | |
    | 9 |   | `dataAvailable` returns true so process the new batch |
    
    This PR expands the scope of `awaitBatchLock.synchronized` to eliminate the above race.
    
    ## How was this patch tested?
    
    test("stress test"). It always failed before this patch. And it will pass after applying this patch. Ignore this test in the PR as it takes several minutes to finish.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #12582 from zsxwing/SPARK-14579-2.
    
    (cherry picked from commit a35a67a83dbb450d26ce0d142ab106e952670842)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 972fd22e3933e58e637781a1da0b6a18afaced17
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-05-02T19:40:21Z

    [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer.
    
    ## What changes were proposed in this pull request?
    
    This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added.
    
    **Before**
    ```scala
    scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain()
    == Physical Plan ==
    WholeStageCodegen
    :  +- TungstenAggregate(key=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9], functions=[], output=[(a + 1)#5])
    :     +- INPUT
    +- Exchange hashpartitioning((a#0 + 1)#6, (1 + a#0)#7, (A#0 + 1)#8, (1 + A#0)#9, 200), None
       +- WholeStageCodegen
          :  +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6,(1 + a#0) AS (1 + a#0)#7,(A#0 + 1) AS (A#0 + 1)#8,(1 + A#0) AS (1 + A#0)#9], functions=[], output=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9])
          :     +- INPUT
          +- LocalTableScan [a#0], [[1],[2]]
    ```
    
    **After**
    ```scala
    scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain()
    == Physical Plan ==
    WholeStageCodegen
    :  +- TungstenAggregate(key=[(a#0 + 1)#6], functions=[], output=[(a + 1)#5])
    :     +- INPUT
    +- Exchange hashpartitioning((a#0 + 1)#6, 200), None
       +- WholeStageCodegen
          :  +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6], functions=[], output=[(a#0 + 1)#6])
          :     +- INPUT
          +- LocalTableScan [a#0], [[1],[2]]
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins tests (with a new testcase)
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #12590 from dongjoon-hyun/SPARK-14830.
    
    (cherry picked from commit 6e6320122ea84247c67e2d0fb0e6af54e2c5bb31)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 4a7e75a203b0a8ecabcb241208aaee5201f6c6e6
Author: Davies Liu <da...@databricks.com>
Date:   2016-05-02T19:58:59Z

    [SPARK-14781] [SQL] support nested predicate subquery
    
    ## What changes were proposed in this pull request?
    
    In order to support nested predicate subquery, this PR introduce an internal join type ExistenceJoin, which will emit all the rows from left, plus an additional column, which presents there are any rows matched from right or not (it's not null-aware right now). This additional column could be used to replace the subquery in Filter.
    
    In theory, all the predicate subquery could use this join type, but it's slower than LeftSemi and LeftAnti, so it's only used for nested subquery (subquery inside OR).
    
    For example, the following SQL:
    ```sql
    SELECT a FROM t  WHERE EXISTS (select 0) OR EXISTS (select 1)
    ```
    
    This PR also fix a bug in predicate subquery push down through join (they should not).
    
    Nested null-aware subquery is still not supported. For example,   `a > 3 OR b NOT IN (select bb from t)`
    
    After this, we could run TPCDS query Q10, Q35, Q45
    
    ## How was this patch tested?
    
    Added unit tests.
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #12820 from davies/or_exists.

commit 56dbf165c0206a59701f61649ec654b9a0b15a3f
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-05-02T20:16:46Z

    [SPARK-13745] [SQL] Support columnar in memory representation on Big Endian platforms
    
    ## What changes were proposed in this pull request?
    
    parquet datasource and ColumnarBatch tests fail on big-endian platforms This patch adds support for the little-endian byte arrays being correctly interpreted on a big-endian platform
    
    ## How was this patch tested?
    
    Spark test builds ran on big endian z/Linux and regression build on little endian amd64
    
    Author: Pete Robbins <ro...@gmail.com>
    
    Closes #12397 from robbinspg/master.
    
    (cherry picked from commit 8a1ce4899fb9f751dedaaa34ea654dfbc8330852)
    Signed-off-by: Davies Liu <da...@gmail.com>

commit 740f96f6362a49fd95a6d56d93b966094166bbf2
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-05-02T21:57:00Z

    [SPARK-15054] Deprecate old accumulator API
    
    ## What changes were proposed in this pull request?
    This patch deprecates the old accumulator API.
    
    ## How was this patch tested?
    N/A
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #12832 from rxin/SPARK-15054.
    
    (cherry picked from commit d5c79f564f3557037c5526e2ee5f963dd100fb34)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 990611cd879d443298e61f672fee41432aab36ef
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-05-02T22:27:16Z

    [SPARK-15052][SQL] Use builder pattern to create SparkSession
    
    ## What changes were proposed in this pull request?
    This patch creates a builder pattern for creating SparkSession. The new code is unused and mostly deadcode. I'm putting it up here for feedback.
    
    There are a few TODOs that can be done as follow-up pull requests:
    - [ ] Update tests to use this
    - [ ] Update examples to use this
    - [ ] Clean up SQLContext code w.r.t. this one (i.e. SparkSession shouldn't call into SQLContext.getOrCreate; it should be the other way around)
    - [ ] Remove SparkSession.withHiveSupport
    - [ ] Disable the old constructor (by making it private) so the only way to start a SparkSession is through this builder pattern
    
    ## How was this patch tested?
    Part of the future pull request is to clean this up and switch existing tests to use this.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #12830 from rxin/sparksession-builder.
    
    (cherry picked from commit ca1b2198581b8de1651a88fc97540570a2347dc9)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 05bb5b6f64d8b4114e3434bc467385d8cba86fd0
Author: poolis <gm...@gmail.com>
Date:   2016-05-02T23:15:07Z

    [SPARK-12928][SQL] Oracle FLOAT datatype is not properly handled when reading via JDBC
    
    The contribution is my original work and that I license the work to the project under the project's open source license.
    
    Author: poolis <gm...@gmail.com>
    Author: Greg Michalopoulos <gm...@gmail.com>
    
    Closes #10899 from poolis/spark-12928.
    
    (cherry picked from commit 917d05f43bddc1728735979fe7e62fe631b35e6f)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit fbc73f73186873cfd60581e58aff4a8d919e39b4
Author: Herman van Hovell <hv...@questtec.nl>
Date:   2016-05-02T23:32:31Z

    [SPARK-14785] [SQL] Support correlated scalar subqueries
    
    ## What changes were proposed in this pull request?
    In this PR we add support for correlated scalar subqueries. An example of such a query is:
    ```SQL
    select * from tbl1 a where a.value > (select max(value) from tbl2 b where b.key = a.key)
    ```
    The implementation adds the `RewriteCorrelatedScalarSubquery` rule to the Optimizer. This rule plans these subqueries using `LEFT OUTER` joins. It currently supports rewrites for `Project`, `Aggregate` & `Filter` logical plans.
    
    I could not find a well defined semantics for the use of scalar subqueries in an `Aggregate`. The current implementation currently evaluates the scalar subquery *before* aggregation. This means that you either have to make scalar subquery part of the grouping expression, or that you have to aggregate it further on. I am open to suggestions on this.
    
    The implementation currently forces the uniqueness of a scalar subquery by enforcing that it is aggregated and that the resulting column is wrapped in an `AggregateExpression`.
    
    ## How was this patch tested?
    Added tests to `SubquerySuite`.
    
    Author: Herman van Hovell <hv...@questtec.nl>
    
    Closes #12822 from hvanhovell/SPARK-14785.

commit 65b94f46021577288ef6c88e00b5b4ed28da33b8
Author: Liwei Lin <lw...@gmail.com>
Date:   2016-05-02T23:48:20Z

    [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter
    
    ## Problem
    
    If an end user happens to write code mixed with continuous-query-oriented methods and non-continuous-query-oriented methods:
    
    ```scala
    ctx.read
       .format("text")
       .stream("...")  // continuous query
       .write
       .text("...")    // non-continuous query; should be startStream() here
    ```
    
    He/she would get this somehow confusing exception:
    
    >
    Exception in thread "main" java.lang.AssertionError: assertion failed: No plan for FileSource[./continuous_query_test_input]
    	at scala.Predef$.assert(Predef.scala:170)
    	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    	at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
    	at ...
    
    ## What changes were proposed in this pull request?
    
    This PR adds checks for continuous-query-oriented methods and non-continuous-query-oriented methods in `DataFrameWriter`:
    
    <table>
    <tr>
    	<td align="center"></td>
    	<td align="center"><strong>can be called on continuous query?</strong></td>
    	<td align="center"><strong>can be called on non-continuous query?</strong></td>
    </tr>
    <tr>
    	<td align="center">mode</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">trigger</td>
    	<td align="center">yes</td>
    	<td align="center"></td>
    </tr>
    <tr>
    	<td align="center">format</td>
    	<td align="center">yes</td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">option/options</td>
    	<td align="center">yes</td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">partitionBy</td>
    	<td align="center">yes</td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">bucketBy</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">sortBy</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">save</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">queryName</td>
    	<td align="center">yes</td>
    	<td align="center"></td>
    </tr>
    <tr>
    	<td align="center">startStream</td>
    	<td align="center">yes</td>
    	<td align="center"></td>
    </tr>
    <tr>
    	<td align="center">insertInto</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">saveAsTable</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">jdbc</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">json</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">parquet</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">orc</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">text</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    <tr>
    	<td align="center">csv</td>
    	<td align="center"></td>
    	<td align="center">yes</td>
    </tr>
    </table>
    
    After this PR's change, the friendly exception would be:
    >
    Exception in thread "main" org.apache.spark.sql.AnalysisException: text() can only be called on non-continuous queries;
    	at org.apache.spark.sql.DataFrameWriter.assertNotStreaming(DataFrameWriter.scala:678)
    	at org.apache.spark.sql.DataFrameWriter.text(DataFrameWriter.scala:629)
    	at ss.SSDemo$.main(SSDemo.scala:47)
    
    ## How was this patch tested?
    
    dedicated unit tests were added
    
    Author: Liwei Lin <lw...@gmail.com>
    
    Closes #12521 from lw-lin/dataframe-writer-check.
    
    (cherry picked from commit 35d9c8aa69c650f33037813607dc939922c5fc27)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit a79797149423568128301507026d7675a6aa6ecb
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-05-03T00:50:40Z

    [SPARK-15050][SQL] Put CSV and JSON options as Python csv and json function parameters
    
    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-15050
    
    This PR adds function parameters for Python API for reading and writing `csv()`.
    
    ## How was this patch tested?
    
    This was tested by `./dev/run_tests`.
    
    Author: hyukjinkwon <gu...@gmail.com>
    Author: Hyukjin Kwon <gu...@gmail.com>
    
    Closes #12834 from HyukjinKwon/SPARK-15050.
    
    (cherry picked from commit d37c7f7f042f7943b5b684e53cf4284c601fb347)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 86167968f7dea8a44fae2d7bdb0bfe8d735e5004
Author: Herman van Hovell <hv...@questtec.nl>
Date:   2016-05-03T01:12:31Z

    [SPARK-15047][SQL] Cleanup SQL Parser
    
    ## What changes were proposed in this pull request?
    This PR addresses a few minor issues in SQL parser:
    
    - Removes some unused rules and keywords in the grammar.
    - Removes code path for fallback SQL parsing (was needed for Hive native parsing).
    - Use `UnresolvedGenerator` instead of hard-coding `Explode` & `JsonTuple`.
    - Adds a more generic way of creating error messages for unsupported Hive features.
    - Use `visitFunctionName` as much as possible.
    - Interpret a `CatalogColumn`'s `DataType` directly instead of parsing it again.
    
    ## How was this patch tested?
    Existing tests.
    
    Author: Herman van Hovell <hv...@questtec.nl>
    
    Closes #12826 from hvanhovell/SPARK-15047.
    
    (cherry picked from commit 1c19c2769edecaefabc2cd67b3b32f901feb3f59)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 733cbaa3c0ff617a630a9d6937699db37ad2943b
Author: bomeng <bm...@us.ibm.com>
Date:   2016-05-03T01:20:29Z

    [SPARK-15062][SQL] fix list type infer serializer issue
    
    ## What changes were proposed in this pull request?
    
    Make serializer correctly inferred if the input type is `List[_]`, since `List[_]` is type of `Seq[_]`, before it was matched to different case (`case t if definedByConstructorParams(t)`).
    
    ## How was this patch tested?
    
    New test case was added.
    
    Author: bomeng <bm...@us.ibm.com>
    
    Closes #12849 from bomeng/SPARK-15062.
    
    (cherry picked from commit 0fd95be3cd815154a11ce7d6998311e7c86bc6b9)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit dcce0aaafedc496e3e69c02c51ad31f01de05287
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-05-03T01:27:49Z

    [SPARK-15077][SQL] Use a fair lock to avoid thread starvation in StreamExecution
    
    ## What changes were proposed in this pull request?
    
    Right now `StreamExecution.awaitBatchLock` uses an unfair lock. `StreamExecution.awaitOffset` may run too long and fail some test because `StreamExecution.constructNextBatch` keeps getting the lock.
    
    See: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/865/testReport/junit/org.apache.spark.sql.streaming/FileStreamSourceStressTestSuite/file_source_stress_test/
    
    This PR uses a fair ReentrantLock to resolve the thread starvation issue.
    
    ## How was this patch tested?
    
    Modified `FileStreamSourceStressTestSuite.test("file source stress test")` to run the test codes 100 times locally. It always fails because of timeout without this patch.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #12852 from zsxwing/SPARK-15077.
    
    (cherry picked from commit 4e3685ae5e5826e63bfcd7c3729e3b9cbab484b5)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 435d903d3f3d26514d7d9b986ec88a3bd69a4df3
Author: Marcin Tustin <ma...@gmail.com>
Date:   2016-05-03T02:37:57Z

    [SPARK-14685][CORE] Document heritability of localProperties
    
    ## What changes were proposed in this pull request?
    
    This updates the java-/scala- doc for setLocalProperty to document heritability of localProperties. This also adds tests for that behaviour.
    
    ## How was this patch tested?
    
    Tests pass. New tests were added.
    
    Author: Marcin Tustin <ma...@gmail.com>
    
    Closes #12455 from marcintustin/SPARK-14685.
    
    (cherry picked from commit 8028f3a0b4003af15ed44d9ef4727b56f4b10534)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit a7e8cfa64de26be2e517e2eda237a9e8a58008c5
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-05-03T04:12:48Z

    [SPARK-15079] Support average/count/sum in Long/DoubleAccumulator
    
    ## What changes were proposed in this pull request?
    This patch removes AverageAccumulator and adds the ability to compute average to LongAccumulator and DoubleAccumulator. The patch also improves documentation for the two accumulators.
    
    ## How was this patch tested?
    Added unit tests for this.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #12858 from rxin/SPARK-15079.
    
    (cherry picked from commit bb9ab56b960153d374d7e8838f62a18e7e45481e)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 52308103ee9bfb12a505505f6d38f1d09a05208f
Author: Andrew Ray <ra...@gmail.com>
Date:   2016-05-03T05:47:32Z

    [SPARK-13749][SQL][FOLLOW-UP] Faster pivot implementation for many distinct values with two phase aggregation
    
    ## What changes were proposed in this pull request?
    
    This is a follow up PR for #11583. It makes 3 lazy vals into just vals and adds unit test coverage.
    
    ## How was this patch tested?
    
    Existing unit tests and additional unit tests.
    
    Author: Andrew Ray <ra...@gmail.com>
    
    Closes #12861 from aray/fast-pivot-follow-up.
    
    (cherry picked from commit d8f528ceb61e3c2ac7ac97cd8147dafbb625932f)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 27efd92e3683f88233ebe755855dac337069246f
Author: Holden Karau <ho...@us.ibm.com>
Date:   2016-05-03T07:18:10Z

    [SPARK-6717][ML] Clear shuffle files after checkpointing in ALS
    
    ## What changes were proposed in this pull request?
    
    When ALS is run with a checkpoint interval, during the checkpoint materialize the current state and cleanup the previous shuffles (non-blocking).
    
    ## How was this patch tested?
    
    Existing ALS unit tests, new ALS checkpoint cleanup unit tests added & shuffle files checked after ALS w/checkpointing run.
    
    Author: Holden Karau <ho...@us.ibm.com>
    Author: Holden Karau <ho...@pigscanfly.ca>
    
    Closes #11919 from holdenk/SPARK-6717-clear-shuffle-files-after-checkpointing-in-ALS.

commit 07a02e8bb6a2a32508627d4a0cb487b38d595184
Author: Sandeep Singh <sa...@techaddict.me>
Date:   2016-05-03T11:38:21Z

    [MINOR][DOCS] Fix type Information in Quick Start and Programming Guide
    
    Author: Sandeep Singh <sa...@techaddict.me>
    
    Closes #12841 from techaddict/improve_docs_1.
    
    (cherry picked from commit dfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 38f6e66afdc92865628238e53ccc156fef976770
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-05-03T11:39:37Z

    [SPARK-15053][BUILD] Fix Java Lint errors on Hive-Thriftserver module
    
    ## What changes were proposed in this pull request?
    
    This issue fixes or hides 181 Java linter errors introduced by SPARK-14987 which copied hive service code from Hive. We had better clean up these errors before releasing Spark 2.0.
    
    - Fix UnusedImports (15 lines), RedundantModifier (14 lines), SeparatorWrap (9 lines), MethodParamPad (6 lines), FileTabCharacter (5 lines), ArrayTypeStyle (3 lines), ModifierOrder (3 lines), RedundantImport (1 line), CommentsIndentation (1 line), UpperEll (1 line), FallThrough (1 line), OneStatementPerLine (1 line), NewlineAtEndOfFile (1 line) errors.
    - Ignore `LineLength` errors under `hive/service/*` (118 lines).
    - Ignore `MethodName` error in `PasswdAuthenticationProvider.java` (1 line).
    - Ignore `NoFinalizer` error in `ThreadWithGarbageCleanup.java` (1 line).
    
    ## How was this patch tested?
    
    After passing Jenkins building, run `dev/lint-java` manually.
    ```bash
    $ dev/lint-java
    Checkstyle checks passed.
    ```
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #12831 from dongjoon-hyun/SPARK-15053.
    
    (cherry picked from commit a7444570764b0a08b7e908dc7931744f9dbdf3c6)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit b802979ad42fd58ed1d8c6e23629169bc2891cbe
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-05-03T11:45:12Z

    [SPARK-15081] Move AccumulatorV2 and subclasses into util package
    
    ## What changes were proposed in this pull request?
    This patch moves AccumulatorV2 and subclasses into util package.
    
    ## How was this patch tested?
    Updated relevant tests.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #12863 from rxin/SPARK-15081.
    
    (cherry picked from commit d557a5e01e8f819d3bd9e6e43d2df733f390d764)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit f03bf7eacb834d2eaeba197ccf704bb721f0b4af
Author: Sean Owen <so...@cloudera.com>
Date:   2016-05-03T12:13:35Z

    [SPARK-14897][CORE] Upgrade Jetty to latest version of 8
    
    ## What changes were proposed in this pull request?
    
    Update Jetty 8.1 to the latest 2016/02 release, from a 2013/10 release, for security and bug fixes. This does not resolve the JIRA necessarily, as it's still worth considering an update to 9.3.
    
    ## How was this patch tested?
    
    Jenkins tests
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #12842 from srowen/SPARK-14897.
    
    (cherry picked from commit 57ac7c182465e1653e74a8ad6c826b2cf56a0ad8)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 932e1b5b2ad19153b5a5aa2255837569406486dd
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-05-03T14:46:13Z

    [SPARK-14971][ML][PYSPARK] PySpark ML Params setter code clean up
    
    ## What changes were proposed in this pull request?
    PySpark ML Params setter code clean up.
    For examples,
    ```setInputCol``` can be simplified from
    ```
    self._set(inputCol=value)
    return self
    ```
    to:
    ```
    return self._set(inputCol=value)
    ```
    This is a pretty big sweeps, and we cleaned wherever possible.
    ## How was this patch tested?
    Exist unit tests.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #12749 from yanboliang/spark-14971.
    
    (cherry picked from commit d26f7cb0121767da678bbbbf3a0e31c63d5e3159)
    Signed-off-by: Nick Pentreath <ni...@za.ibm.com>

commit a373c39a98a395e78ac4c0116c47a9eec39ac3e6
Author: Sun Rui <su...@gmail.com>
Date:   2016-05-03T16:29:49Z

    [SPARK-15091][SPARKR] Fix warnings and a failure in SparkR test cases with testthat version 1.0.1
    
    ## What changes were proposed in this pull request?
    Fix warnings and a failure in SparkR test cases with testthat version 1.0.1
    
    ## How was this patch tested?
    SparkR unit test cases.
    
    Author: Sun Rui <su...@gmail.com>
    
    Closes #12867 from sun-rui/SPARK-15091.
    
    (cherry picked from commit 8b6491fc0b49b4e363887ae4b452ba69fe0290d5)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 17996e7d02b6566d21c352c37ea0ed3e543ded59
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-05-03T16:43:47Z

    [SPARK-15088] [SQL] Remove SparkSqlSerializer
    
    ## What changes were proposed in this pull request?
    This patch removes SparkSqlSerializer. I believe this is now dead code.
    
    ## How was this patch tested?
    Removed a test case related to it.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #12864 from rxin/SPARK-15088.
    
    (cherry picked from commit 5503e453ba00676925531f91f66c0108ac6b1fca)
    Signed-off-by: Davies Liu <da...@gmail.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 2.0

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13089


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 2.0

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the pull request:

    https://github.com/apache/spark/pull/13089#issuecomment-218940318
  
    @ahnqirage please close it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 2.0

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/13089#issuecomment-220425293
  
    @ahnqirage please close this PR. It seems to be opened by mistake.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 2.0

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/13089#issuecomment-219859010
  
    Is this PR a mistake (with no title, linked JIRA, description)?  Can you please close this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 2.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13089#issuecomment-218933841
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org