You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by ptkool <gi...@git.apache.org> on 2017/10/02 21:18:42 UTC

[GitHub] spark pull request #19414: Udf nullablity fixes

GitHub user ptkool opened a pull request:

    https://github.com/apache/spark/pull/19414

    Udf nullablity fixes

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Shopify/spark udf_nullablity_fixes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19414
    
----
commit 5a0a8b0396df2feadb8333876cc08edf219fa177
Author: Sean Owen <so...@cloudera.com>
Date:   2017-05-02T00:01:05Z

    [SPARK-20459][SQL] JdbcUtils throws IllegalStateException: Cause already initialized after getting SQLException
    
    ## What changes were proposed in this pull request?
    
    Avoid failing to initCause on JDBC exception with cause initialized to null
    
    ## How was this patch tested?
    
    Existing tests
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #17800 from srowen/SPARK-20459.
    
    (cherry picked from commit af726cd6117de05c6e3b9616b8699d884a53651b)
    Signed-off-by: Xiao Li <ga...@gmail.com>

commit b7c1c2f973635a2ec05aedd89456765d830dfdce
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-05-02T04:03:48Z

    [SPARK-20192][SPARKR][DOC] SparkR migration guide to 2.2.0
    
    ## What changes were proposed in this pull request?
    
    Updating R Programming Guide
    
    ## How was this patch tested?
    
    manually
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #17816 from felixcheung/r22relnote.
    
    (cherry picked from commit d20a976e8918ca8d607af452301e8014fe14e64a)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit b146481fff1ce529245f9c03b35c73ea604712d0
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-05-02T05:56:41Z

    [SPARK-20537][CORE] Fixing OffHeapColumnVector reallocation
    
    ## What changes were proposed in this pull request?
    
    As #17773 revealed `OnHeapColumnVector` may copy a part of the original storage.
    
    `OffHeapColumnVector` reallocation also copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the `ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used.
    This PR copies the new storage data up to the previously-allocated size in`OffHeapColumnVector`.
    
    ## How was this patch tested?
    
    Existing test suites
    
    Author: Kazuaki Ishizaki <is...@jp.ibm.com>
    
    Closes #17811 from kiszk/SPARK-20537.
    
    (cherry picked from commit afb21bf22a59c9416c04637412fb69d1442e6826)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit ef5e2a0509801f6afced3bc80f8d700acf84e0dd
Author: Burak Yavuz <br...@gmail.com>
Date:   2017-05-02T06:08:16Z

    [SPARK-20549] java.io.CharConversionException: Invalid UTF-32' in JsonToStructs
    
    ## What changes were proposed in this pull request?
    
    A fix for the same problem was made in #17693 but ignored `JsonToStructs`. This PR uses the same fix for `JsonToStructs`.
    
    ## How was this patch tested?
    
    Regression test
    
    Author: Burak Yavuz <br...@gmail.com>
    
    Closes #17826 from brkyvz/SPARK-20549.
    
    (cherry picked from commit 86174ea89b39a300caaba6baffac70f3dc702788)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 01f3be7ddc72643fb18bad8304e6c8eebf04b3e6
Author: Nick Pentreath <ni...@za.ibm.com>
Date:   2017-05-02T08:49:13Z

    [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recommendForAllUsers,Items
    
    Add Python API for `ALSModel` methods `recommendForAllUsers`, `recommendForAllItems`
    
    ## How was this patch tested?
    
    New doc tests.
    
    Author: Nick Pentreath <ni...@za.ibm.com>
    
    Closes #17622 from MLnick/SPARK-20300-pyspark-recall.
    
    (cherry picked from commit e300a5a145820ecd466885c73245d6684e8cb0aa)
    Signed-off-by: Nick Pentreath <ni...@za.ibm.com>

commit 4f4083bfaaaaca7a5da80d346652a5f831aba7e6
Author: Xiao Li <ga...@gmail.com>
Date:   2017-05-02T08:49:24Z

    [SPARK-19235][SQL][TEST][FOLLOW-UP] Enable Test Cases in DDLSuite with Hive Metastore
    
    ### What changes were proposed in this pull request?
    This is a follow-up of enabling test cases in DDLSuite with Hive Metastore. It consists of the following remaining tasks:
    - Run all the `alter table` and `drop table` DDL tests against data source tables when using Hive metastore.
    - Do not run any `alter table` and `drop table` DDL test against Hive serde tables when using InMemoryCatalog.
    - Reenable `alter table: set serde partition` and `alter table: set serde` tests for Hive serde tables.
    
    ### How was this patch tested?
    N/A
    
    Author: Xiao Li <ga...@gmail.com>
    
    Closes #17524 from gatorsmile/cleanupDDLSuite.
    
    (cherry picked from commit b1e639ab09d3a7a1545119e45a505c9a04308353)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 871b073b9983d1f04d71266de170be13e9fb8440
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2017-05-02T21:30:06Z

    [SPARK-20421][CORE] Add a missing deprecation tag.
    
    In the previous patch I deprecated StorageStatus, but not the
    method in SparkContext that exposes that class publicly. So deprecate
    the method too.
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #17824 from vanzin/SPARK-20421.
    
    (cherry picked from commit ef3df9125a30f8fb817fe855b74d7130be45b0ee)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit c199764babc874be153dee4056d4eab755bb002c
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-05-03T02:08:46Z

    [SPARK-20558][CORE] clear InheritableThreadLocal variables in SparkContext when stopping it
    
    ## What changes were proposed in this pull request?
    
    To better understand this problem, let's take a look at an example first:
    ```
    object Main {
      def main(args: Array[String]): Unit = {
        var t = new Test
        new Thread(new Runnable {
          override def run() = {}
        }).start()
        println("first thread finished")
    
        t.a = null
        t = new Test
        new Thread(new Runnable {
          override def run() = {}
        }).start()
      }
    
    }
    
    class Test {
      var a = new InheritableThreadLocal[String] {
        override protected def childValue(parent: String): String = {
          println("parent value is: " + parent)
          parent
        }
      }
      a.set("hello")
    }
    ```
    The result is:
    ```
    parent value is: hello
    first thread finished
    parent value is: hello
    parent value is: hello
    ```
    
    Once an `InheritableThreadLocal` has been set value, child threads will inherit its value as long as it has not been GCed, so setting the variable which holds the `InheritableThreadLocal` to `null` doesn't work as we expected.
    
    In `SparkContext`, we have an `InheritableThreadLocal` for local properties, we should clear it when stopping `SparkContext`, or all the future child threads will still inherit it and copy the properties and waste memory.
    
    This is the root cause of https://issues.apache.org/jira/browse/SPARK-20548 , which creates/stops `SparkContext` many times and finally have a lot of `InheritableThreadLocal` alive, and cause OOM when starting new threads in the internal thread pools.
    
    ## How was this patch tested?
    
    N/A
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #17833 from cloud-fan/core.
    
    (cherry picked from commit b946f3160eb7953fb30edf1f097ea87be75b33e7)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit c80242ab9c3dfab0341c499075bb302d590c9ed7
Author: Michael Armbrust <mi...@databricks.com>
Date:   2017-05-03T05:44:27Z

    [SPARK-20567] Lazily bind in GenerateExec
    
    It is not valid to eagerly bind with the child's output as this causes failures when we attempt to canonicalize the plan (replacing the attribute references with dummies).
    
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #17838 from marmbrus/fixBindExplode.
    
    (cherry picked from commit 6235132a8ce64bb12d825d0a65e5dd052d1ee647)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit 4f647ab66353b136e4fdf02587ebbd88ce5c5b5f
Author: MechCoder <ma...@gmail.com>
Date:   2017-05-03T08:58:05Z

    [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers for SVD and PCA (v2)
    
    Add PCA and SVD to PySpark's wrappers for `RowMatrix` and `IndexedRowMatrix` (SVD only).
    
    Based on #7963, updated.
    
    ## How was this patch tested?
    
    New doc tests and unit tests. Ran all examples locally.
    
    Author: MechCoder <ma...@gmail.com>
    Author: Nick Pentreath <ni...@za.ibm.com>
    
    Closes #17621 from MLnick/SPARK-6227-pyspark-svd-pca.
    
    (cherry picked from commit db2fb84b4a3c45daa449cc9232340193ce8eb37d)
    Signed-off-by: Nick Pentreath <ni...@za.ibm.com>

commit b5947f5c33eb403d65b1c316d1781c3d7cebf01b
Author: Sean Owen <so...@cloudera.com>
Date:   2017-05-03T09:18:35Z

    [SPARK-20523][BUILD] Clean up build warnings for 2.2.0 release
    
    ## What changes were proposed in this pull request?
    
    Fix build warnings primarily related to Breeze 0.13 operator changes, Java style problems
    
    ## How was this patch tested?
    
    Existing tests
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #17803 from srowen/SPARK-20523.
    
    (cherry picked from commit 16fab6b0ef3dcb33f92df30e17680922ad5fb672)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit b1a732fead32a37afcb7cf7a35facc49a449b8e2
Author: Liwei Lin <lw...@gmail.com>
Date:   2017-05-03T15:55:02Z

    [SPARK-20441][SPARK-20432][SS] Within the same streaming query, one StreamingRelation should only be transformed to one StreamingExecutionRelation
    
    ## What changes were proposed in this pull request?
    
    Within the same streaming query, when one `StreamingRelation` is referred multiple times – e.g. `df.union(df)` – we should transform it only to one `StreamingExecutionRelation`, instead of two or more different `StreamingExecutionRelation`s (each of which would have a separate set of source, source logs, ...).
    
    ## How was this patch tested?
    
    Added two test cases, each of which would fail without this patch.
    
    Author: Liwei Lin <lw...@gmail.com>
    
    Closes #17735 from lw-lin/SPARK-20441.
    
    (cherry picked from commit 27f543b15f2f493f6f8373e46b4c9564b0a1bf81)
    Signed-off-by: Burak Yavuz <br...@gmail.com>

commit f0e80aa2ddee80819ef33ee24eb6a15a73bc02d5
Author: Reynold Xin <rx...@databricks.com>
Date:   2017-05-03T16:22:25Z

    [SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame
    
    ## What changes were proposed in this pull request?
    We allow users to specify hints (currently only "broadcast" is supported) in SQL and DataFrame. However, while SQL has a standard hint format (/*+ ... */), DataFrame doesn't have one and sometimes users are confused that they can't find how to apply a broadcast hint. This ticket adds a generic hint function on DataFrame that allows using the same hint on DataFrames as well as SQL.
    
    As an example, after this patch, the following will apply a broadcast hint on a DataFrame using the new hint function:
    
    ```
    df1.join(df2.hint("broadcast"))
    ```
    
    ## How was this patch tested?
    Added a test case in DataFrameJoinSuite.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #17839 from rxin/SPARK-20576.
    
    (cherry picked from commit 527fc5d0c990daaacad4740f62cfe6736609b77b)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 36d80790699c529b15e9c1a2cf2f9f636b1f24e6
Author: Liwei Lin <lw...@gmail.com>
Date:   2017-05-03T18:10:24Z

    [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output
    
    ## The Problem
    
    Right now DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output:
    
    ```
    [info] - partitioned writing and batch reading with 'basePath' *** FAILED *** (3 seconds, 928 milliseconds)
    [info]   java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths:
    [info] 	***/stream.output-65e3fa45-595a-4d29-b3df-4c001e321637
    [info] 	***/stream.output-65e3fa45-595a-4d29-b3df-4c001e321637/_spark_metadata
    [info]
    [info] If provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.
    [info]   at scala.Predef$.assert(Predef.scala:170)
    [info]   at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:133)
    [info]   at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:98)
    [info]   at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:156)
    [info]   at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:54)
    [info]   at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:55)
    [info]   at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:133)
    [info]   at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:361)
    [info]   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:160)
    [info]   at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:536)
    [info]   at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:520)
    [info]   at org.apache.spark.sql.streaming.FileStreamSinkSuite$$anonfun$8.apply$mcV$sp(FileStreamSinkSuite.scala:292)
    [info]   at org.apache.spark.sql.streaming.FileStreamSinkSuite$$anonfun$8.apply(FileStreamSinkSuite.scala:268)
    [info]   at org.apache.spark.sql.streaming.FileStreamSinkSuite$$anonfun$8.apply(FileStreamSinkSuite.scala:268)
    ```
    
    ## What changes were proposed in this pull request?
    
    This patch alters `InMemoryFileIndex` to filter out these `basePath`s whose ancestor is the streaming metadata dir (`_spark_metadata`). E.g., the following and other similar dir or files will be filtered out:
    - (introduced by globbing `basePath/*`)
       - `basePath/_spark_metadata`
    - (introduced by globbing `basePath/*/*`)
       - `basePath/_spark_metadata/0`
       - `basePath/_spark_metadata/1`
       - ...
    
    ## How was this patch tested?
    
    Added unit tests
    
    Author: Liwei Lin <lw...@gmail.com>
    
    Closes #17346 from lw-lin/filter-metadata.
    
    (cherry picked from commit 6b9e49d12fc4c9b29d497122daa4cc9bf4540b16)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit 2629e7c7a1dacfb267d866cf825fa8a078612462
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-05-03T20:08:25Z

    [MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and add a test for =!=
    
    ## What changes were proposed in this pull request?
    
    This PR proposes three things as below:
    
    - This test looks not testing `<=>` and identical with the test above, `===`. So, it removes the test.
    
      ```diff
      -   test("<=>") {
      -     checkAnswer(
      -      testData2.filter($"a" === 1),
      -      testData2.collect().toSeq.filter(r => r.getInt(0) == 1))
      -
      -    checkAnswer(
      -      testData2.filter($"a" === $"b"),
      -      testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1)))
      -   }
      ```
    
    - Replace the test title from `=!=` to `<=>`. It looks the test actually testing `<=>`.
    
      ```diff
      +  private lazy val nullData = Seq(
      +    (Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, None)).toDF("a", "b")
      +
        ...
      -  test("=!=") {
      +  test("<=>") {
      -    val nullData = spark.createDataFrame(sparkContext.parallelize(
      -      Row(1, 1) ::
      -      Row(1, 2) ::
      -      Row(1, null) ::
      -      Row(null, null) :: Nil),
      -      StructType(Seq(StructField("a", IntegerType), StructField("b", IntegerType))))
      -
             checkAnswer(
               nullData.filter($"b" <=> 1),
        ...
      ```
    
    - Add the tests for `=!=` which looks not existing.
    
      ```diff
      +  test("=!=") {
      +    checkAnswer(
      +      nullData.filter($"b" =!= 1),
      +      Row(1, 2) :: Nil)
      +
      +    checkAnswer(nullData.filter($"b" =!= null), Nil)
      +
      +    checkAnswer(
      +      nullData.filter($"a" =!= $"b"),
      +      Row(1, 2) :: Nil)
      +  }
      ```
    
    ## How was this patch tested?
    
    Manually running the tests.
    
    Author: hyukjinkwon <gu...@gmail.com>
    
    Closes #17842 from HyukjinKwon/minor-test-fix.
    
    (cherry picked from commit 13eb37c860c8f672d0e9d9065d0333f981db71e3)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 1d4017b44d5e6ad156abeaae6371747f111dd1f9
Author: Patrick Wendell <pw...@gmail.com>
Date:   2017-05-03T23:50:08Z

    Preparing Spark release v2.2.0-rc2

commit a3a5fcfefcc25e03496d097b63cd268f61d24c09
Author: Patrick Wendell <pw...@gmail.com>
Date:   2017-05-03T23:50:12Z

    Preparing development version 2.2.1-SNAPSHOT

commit d8bd213f13279664d50ffa57c1814d0b16fc5d23
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-04T02:15:28Z

    [SPARK-20584][PYSPARK][SQL] Python generic hint support
    
    ## What changes were proposed in this pull request?
    
    Adds `hint` method to PySpark `DataFrame`.
    
    ## How was this patch tested?
    
    Unit tests, doctests.
    
    Author: zero323 <ze...@users.noreply.github.com>
    
    Closes #17850 from zero323/SPARK-20584.
    
    (cherry picked from commit 02bbe73118a39e2fb378aa2002449367a92f6d67)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 5fe9313d7c81679981000b8aea5ea4668a0a0bc8
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-05-04T04:40:18Z

    [SPARK-20544][SPARKR] skip tests when running on CRAN
    
    General rule on skip or not:
    skip if
    - RDD tests
    - tests could run long or complicated (streaming, hivecontext)
    - tests on error conditions
    - tests won't likely change/break
    
    unit tests, `R CMD check --as-cran`, `R CMD check`
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #17817 from felixcheung/rskiptest.
    
    (cherry picked from commit fc472bddd1d9c6a28e57e31496c0166777af597e)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit 6c5c594b77fb36d531cdaba5a34abe85b138d0a6
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-05-04T07:27:10Z

    [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example
    
    Add
    - R vignettes
    - R programming guide
    - SS programming guide
    - R example
    
    Also disable spark.als in vignettes for now since it's failing (SPARK-20402)
    
    manually
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #17814 from felixcheung/rdocss.
    
    (cherry picked from commit b8302ccd02265f9d7a7895c7b033441fa2d8ffd1)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit 3f5c548128c17d058b5ab2142938f6d03b38e0b1
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-04T08:41:36Z

    [SPARK-20585][SPARKR] R generic hint support
    
    Adds support for generic hints on `SparkDataFrame`
    
    Unit tests, `check-cran.sh`
    
    Author: zero323 <ze...@users.noreply.github.com>
    
    Closes #17851 from zero323/SPARK-20585.
    
    (cherry picked from commit 9c36aa27919fb7625e388f5c3c90af62ef902b24)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit b6727795fea67264608a72febfc32f913cdb9d7c
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-05-04T08:54:59Z

    [SPARK-20571][SPARKR][SS] Flaky Structured Streaming tests
    
    ## What changes were proposed in this pull request?
    
    Make tests more reliable by having it till processed.
    Increasing timeout value might help but ultimately the flakiness from processing delay when Jenkins is hard to account for. This isn't an actual public API supported
    
    ## How was this patch tested?
    unit tests
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #17857 from felixcheung/rsstestrelia.
    
    (cherry picked from commit 57b64703e66ec8490d8d9dbf6beebc160a61ec29)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit 425ed26d2a0f6d3308bdb4fcbf0cedc6ef12612e
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-05-04T09:56:43Z

    [SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regression follow up
    
    ## What changes were proposed in this pull request?
    Address some minor comments for #17715:
    * Put bound-constrained optimization params under expertParams.
    * Update some docs.
    
    ## How was this patch tested?
    Existing tests.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #17829 from yanboliang/spark-20047-followup.
    
    (cherry picked from commit c5dceb8c65545169bc96628140b5acdaa85dd226)
    Signed-off-by: Yanbo Liang <yb...@gmail.com>

commit c8756288de12cfd9528d8d3ff73ff600909d657a
Author: Wayne Zhang <ac...@uber.com>
Date:   2017-05-05T02:23:58Z

    [SPARK-20574][ML] Allow Bucketizer to handle non-Double numeric column
    
    ## What changes were proposed in this pull request?
    Bucketizer currently requires input column to be Double, but the logic should work on any numeric data types. Many practical problems have integer/float data types, and it could get very tedious to manually cast them into Double before calling bucketizer. This PR extends bucketizer to handle all numeric types.
    
    ## How was this patch tested?
    New test.
    
    Author: Wayne Zhang <ac...@uber.com>
    
    Closes #17840 from actuaryzhang/bucketizer.
    
    (cherry picked from commit 0d16faab90e4cd1f73c5b749dbda7bc2a400b26f)
    Signed-off-by: Yanbo Liang <yb...@gmail.com>

commit 7cb566abc27d41d5816dee16c6ecb749da2adf46
Author: Yuming Wang <wg...@gmail.com>
Date:   2017-05-05T10:31:59Z

    [SPARK-19660][SQL] Replace the deprecated property name fs.default.name to fs.defaultFS that newly introduced
    
    ## What changes were proposed in this pull request?
    
    Replace the deprecated property name `fs.default.name` to `fs.defaultFS` that newly introduced.
    
    ## How was this patch tested?
    
    Existing tests
    
    Author: Yuming Wang <wg...@gmail.com>
    
    Closes #17856 from wangyum/SPARK-19660.
    
    (cherry picked from commit 37cdf077cd3f436f777562df311e3827b0727ce7)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit dbb54a7b39568cc9e8046a86113b98c3c69b7d11
Author: jyu00 <je...@us.ibm.com>
Date:   2017-05-05T10:36:51Z

    [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode
    
    ## What changes were proposed in this pull request?
    
    Updated spark-class to turn off posix mode so the process substitution doesn't cause a syntax error.
    
    ## How was this patch tested?
    
    Existing unit tests, manual spark-shell testing with posix mode on
    
    Author: jyu00 <je...@us.ibm.com>
    
    Closes #17852 from jyu00/master.
    
    (cherry picked from commit 5773ab121d5d7cbefeef17ff4ac6f8af36cc1251)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 1fa3c86a740e072957a2104dbd02ca3c158c508d
Author: Jarrett Meyer <ja...@gmail.com>
Date:   2017-05-05T15:30:42Z

    [SPARK-20613] Remove excess quotes in Windows executable
    
    ## What changes were proposed in this pull request?
    
    Quotes are already added to the RUNNER variable on line 54. There is no need to put quotes on line 67. If you do, you will get an error when launching Spark.
    
    '""C:\Program' is not recognized as an internal or external command, operable program or batch file.
    
    ## How was this patch tested?
    
    Tested manually on Windows 10.
    
    Author: Jarrett Meyer <ja...@gmail.com>
    
    Closes #17861 from jarrettmeyer/fix-windows-cmd.
    
    (cherry picked from commit b9ad2d1916af5091c8585d06ccad8219e437e2bc)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit f71aea6a0be6eda24623d8563d971687ecd04caf
Author: Yucai <yu...@intel.com>
Date:   2017-05-05T16:51:57Z

    [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ObjectHashAggregateExec
    
    ## What changes were proposed in this pull request?
    
    ObjectHashAggregateExec is missing numOutputRows, add this metrics for it.
    
    ## How was this patch tested?
    
    Added unit tests for the new metrics.
    
    Author: Yucai <yu...@intel.com>
    
    Closes #17678 from yucai/objectAgg_numOutputRows.
    
    (cherry picked from commit 41439fd52dd263b9f7d92e608f027f193f461777)
    Signed-off-by: Xiao Li <ga...@gmail.com>

commit 24fffacad709c553e0f24ae12a8cca3ab980af3c
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-05-05T18:08:26Z

    [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load
    
    ## What changes were proposed in this pull request?
    
    I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic `__consumer_offsets`. As Kafka creates this topic lazily, the topic creation happens in the first test `deserialization of initial offset with Spark 2.1.0` and causes it timeout.
    
    This PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #17863 from zsxwing/fix-kafka-flaky-test.
    
    (cherry picked from commit bd5788287957d8610a6d19c273b75bd4cdd2d166)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit f59c74a9460b0db4e6c3ecbe872e2eaadc43e2cc
Author: Michael Patterson <ma...@gmail.com>
Date:   2017-04-23T02:58:54Z

    [SPARK-20132][DOCS] Add documentation for column string functions
    
    ## What changes were proposed in this pull request?
    Add docstrings to column.py for the Column functions `rlike`, `like`, `startswith`, and `endswith`. Pass these docstrings through `_bin_op`
    
    There may be a better place to put the docstrings. I put them immediately above the Column class.
    
    ## How was this patch tested?
    
    I ran `make html` on my local computer to remake the documentation, and verified that the html pages were displaying the docstrings correctly. I tried running `dev-tests`, and the formatting tests passed. However, my mvn build didn't work I think due to issues on my computer.
    
    These docstrings are my original work and free license.
    
    davies has done the most recent work reorganizing `_bin_op`
    
    Author: Michael Patterson <ma...@gmail.com>
    
    Closes #17469 from map222/patterson-documentation.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19414: Udf nullablity fixes

Posted by ptkool <gi...@git.apache.org>.

Github user ptkool closed the pull request at:

    https://github.com/apache/spark/pull/19414


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org