You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by klion26 <gi...@git.apache.org> on 2017/03/08 08:34:26 UTC

[GitHub] spark pull request #17206: Branch 1.6

GitHub user klion26 opened a pull request:

    https://github.com/apache/spark/pull/17206

    Branch 1.6

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/klion26/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17206.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17206
    
----
commit bd8efba8f2131d951829020b4c68309a174859cf
Author: Michael Armbrust <mi...@databricks.com>
Date:   2016-02-02T08:51:07Z

    [SPARK-13087][SQL] Fix group by function for sort based aggregation
    
    It is not valid to call `toAttribute` on a `NamedExpression` unless we know for sure that the child produced that `NamedExpression`.  The current code worked fine when the grouping expressions were simple, but when they were a derived value this blew up at execution time.
    
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #11011 from marmbrus/groupByFunction.

commit 99594b213c941cd3ffa3a034f007e44efebdb545
Author: Michael Armbrust <mi...@databricks.com>
Date:   2016-02-02T18:15:40Z

    [SPARK-13094][SQL] Add encoders for seq/array of primitives
    
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #11014 from marmbrus/seqEncoders.
    
    (cherry picked from commit 29d92181d0c49988c387d34e4a71b1afe02c29e2)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 9a3d1bd09cdf4a7c2992525c203d4dac764fddb8
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-02-02T18:21:21Z

    [SPARK-12780][ML][PYTHON][BACKPORT] Inconsistency returning value of ML python models' properties
    
    Backport of [SPARK-12780] for branch-1.6
    
    Original PR for master: https://github.com/apache/spark/pull/10724
    
    This fixes StringIndexerModel.labels in pyspark.
    
    Author: Xusen Yin <yi...@gmail.com>
    
    Closes #10950 from jkbradley/yinxusen-spark-12780-backport.

commit 53f518a6e2791cc4967793b6cc0d4a68d579cb33
Author: Narine Kokhlikyan <na...@gmail.com>
Date:   2016-01-22T18:35:02Z

    [SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method
    
    I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629
    Please, let me know what do you think.
    Thanks!
    
    Author: Narine Kokhlikyan <na...@gmail.com>
    
    Closes #10580 from NarineK/sparkrSavaAsRable.
    
    (cherry picked from commit 8a88e121283472c26e70563a4e04c109e9b183b3)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 4c28b4c8f342fde937ff77ab30f898dfe3186c03
Author: Gabriele Nizzoli <ma...@nizzoli.net>
Date:   2016-02-02T18:57:18Z

    [SPARK-13121][STREAMING] java mapWithState mishandles scala Option
    
    java mapwithstate with Function3 has wrong conversion of java `Optional` to scala `Option`, fixed code uses same conversion used in the mapwithstate call that uses Function4 as an input. `Optional.fromNullable(v.get)` fails if v is `None`, better to use `JavaUtils.optionToOptional(v)` instead.
    
    Author: Gabriele Nizzoli <ma...@nizzoli.net>
    
    Closes #11007 from gabrielenizzoli/branch-1.6.

commit 9c0cf22f7681ae05d894ae05f6a91a9467787519
Author: Grzegorz Chilkiewicz <gr...@codilime.com>
Date:   2016-02-02T19:16:24Z

    [SPARK-12711][ML] ML StopWordsRemover does not protect itself from column name duplication
    
    Fixes problem and verifies fix by test suite.
    Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn
    and deduplicates SchemaUtils.appendColumn functions.
    
    Author: Grzegorz Chilkiewicz <gr...@codilime.com>
    
    Closes #10741 from grzegorz-chilkiewicz/master.
    
    (cherry picked from commit b1835d727234fdff42aa8cadd17ddcf43b0bed15)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit 3c92333ee78f249dae37070d3b6558b9c92ec7f4
Author: Daoyuan Wang <da...@intel.com>
Date:   2016-02-02T19:09:40Z

    [SPARK-13056][SQL] map column would throw NPE if value is null
    
    Jira:
    https://issues.apache.org/jira/browse/SPARK-13056
    
    Create a map like
    { "a": "somestring", "b": null}
    Query like
    SELECT col["b"] FROM t1;
    NPE would be thrown.
    
    Author: Daoyuan Wang <da...@intel.com>
    
    Closes #10964 from adrian-wang/npewriter.
    
    (cherry picked from commit 358300c795025735c3b2f96c5447b1b227d4abc1)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>
    
    Conflicts:
    	sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

commit e81333be05cc5e2a41e5eb1a630c5af59a47dd23
Author: Kevin (Sangwoo) Kim <sa...@gmail.com>
Date:   2016-02-02T21:24:09Z

    [DOCS] Update StructType.scala
    
    The example will throw error like
    <console>:20: error: not found: value StructType
    
    Need to add this line:
    import org.apache.spark.sql.types._
    
    Author: Kevin (Sangwoo) Kim <sa...@gmail.com>
    
    Closes #10141 from swkimme/patch-1.
    
    (cherry picked from commit b377b03531d21b1d02a8f58b3791348962e1f31b)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 2f8abb4afc08aa8dc4ed763bcb93ff6b1d6f0d78
Author: Adam Budde <bu...@amazon.com>
Date:   2016-02-03T03:35:33Z

    [SPARK-13122] Fix race condition in MemoryStore.unrollSafely()
    
    https://issues.apache.org/jira/browse/SPARK-13122
    
    A race condition can occur in MemoryStore's unrollSafely() method if two threads that
    return the same value for currentTaskAttemptId() execute this method concurrently. This
    change makes the operation of reading the initial amount of unroll memory used, performing
    the unroll, and updating the associated memory maps atomic in order to avoid this race
    condition.
    
    Initial proposed fix wraps all of unrollSafely() in a memoryManager.synchronized { } block. A cleaner approach might be introduce a mechanism that synchronizes based on task attempt ID. An alternative option might be to track unroll/pending unroll memory based on block ID rather than task attempt ID.
    
    Author: Adam Budde <bu...@amazon.com>
    
    Closes #11012 from budde/master.
    
    (cherry picked from commit ff71261b651a7b289ea2312abd6075da8b838ed9)
    Signed-off-by: Andrew Or <an...@databricks.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/storage/MemoryStore.scala

commit 5fe8796c2fa859e30cf5ba293bee8957e23163bc
Author: Mario Briggs <ma...@in.ibm.com>
Date:   2016-02-03T17:50:28Z

    [SPARK-12739][STREAMING] Details of batch in Streaming tab uses two Duration columns
    
    I have clearly prefix the two 'Duration' columns in 'Details of Batch' Streaming tab as 'Output Op Duration' and 'Job Duration'
    
    Author: Mario Briggs <ma...@in.ibm.com>
    Author: mariobriggs <ma...@in.ibm.com>
    
    Closes #11022 from mariobriggs/spark-12739.
    
    (cherry picked from commit e9eb248edfa81d75f99c9afc2063e6b3d9ee7392)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit cdfb2a1410aa799596c8b751187dbac28b2cc678
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-02-04T00:13:23Z

    [SPARK-13101][SQL][BRANCH-1.6] nullability of array type element should not fail analysis of encoder
    
    nullability should only be considered as an optimization rather than part of the type system, so instead of failing analysis for mismatch nullability, we should pass analysis and add runtime null check.
    
    backport https://github.com/apache/spark/pull/11035 to 1.6
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #11042 from cloud-fan/branch-1.6.

commit 2f390d3066297466d98e17a78c5433f37f70cc95
Author: Yuhao Yang <hh...@gmail.com>
Date:   2016-02-04T05:19:44Z

    [ML][DOC] fix wrong api link in ml onevsrest
    
    minor fix for api link in ml onevsrest
    
    Author: Yuhao Yang <hh...@gmail.com>
    
    Closes #11068 from hhbyyh/onevsrestDoc.
    
    (cherry picked from commit c2c956bcd1a75fd01868ee9ad2939a6d3de52bc2)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit a907c7c64887833770cd593eecccf53620de59b7
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-02-04T20:43:16Z

    [SPARK-13195][STREAMING] Fix NoSuchElementException when a state is not set but timeoutThreshold is defined
    
    Check the state Existence before calling get.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #11081 from zsxwing/SPARK-13195.
    
    (cherry picked from commit 8e2f296306131e6c7c2f06d6672995d3ff8ab021)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit 3ca5dc3072d0d96ba07d102e9104cbbb177c352b
Author: Bill Chambers <bi...@databricks.com>
Date:   2016-02-05T22:35:39Z

    [SPARK-13214][DOCS] update dynamicAllocation documentation
    
    Author: Bill Chambers <bi...@databricks.com>
    
    Closes #11094 from anabranch/dynamic-docs.
    
    (cherry picked from commit 66e1383de2650a0f06929db8109a02e32c5eaf6b)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 9b30096227263f77fc67ed8f12fb2911c3256774
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-08T20:08:58Z

    [SPARK-13210][SQL] catch OOM when allocate memory and expand array
    
    There is a bug when we try to grow the buffer, OOM is ignore wrongly (the assert also skipped by JVM), then we try grow the array again, this one will trigger spilling free the current page, the current record we inserted will be invalid.
    
    The root cause is that JVM has less free memory than MemoryManager thought, it will OOM when allocate a page without trigger spilling. We should catch the OOM, and acquire memory again to trigger spilling.
    
    And also, we could not grow the array in `insertRecord` of `InMemorySorter` (it was there just for easy testing).
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #11095 from davies/fix_expand.

commit 82fa86470682cb4fcd4b3d5351167e4a936b8494
Author: Steve Loughran <st...@hortonworks.com>
Date:   2016-02-09T19:01:47Z

    [SPARK-12807][YARN] Spark External Shuffle not working in Hadoop clusters with Jackson 2.2.3
    
    Patch to
    
    1. Shade jackson 2.x in spark-yarn-shuffle JAR: core, databind, annotation
    2. Use maven antrun to verify the JAR has the renamed classes
    
    Being Maven-based, I don't know if the verification phase kicks in on an SBT/jenkins build. It will on a `mvn install`
    
    Author: Steve Loughran <st...@hortonworks.com>
    
    Closes #10780 from steveloughran/stevel/patches/SPARK-12807-master-shuffle.
    
    (cherry picked from commit 34d0b70b309f16af263eb4e6d7c36e2ea170bc67)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit 89818cbf808137201d2558eaab312264d852cf00
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2016-02-10T01:10:55Z

    [SPARK-10524][ML] Use the soft prediction to order categories' bins
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-10524
    
    Currently we use the hard prediction (`ImpurityCalculator.predict`) to order categories' bins. But we should use the soft prediction.
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    Author: Liang-Chi Hsieh <vi...@appier.com>
    Author: Joseph K. Bradley <jo...@databricks.com>
    
    Closes #8734 from viirya/dt-soft-centroids.
    
    (cherry picked from commit 9267bc68fab65c6a798e065a1dbe0f5171df3077)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit 93f1d91755475a242456fe06e57bfca10f4d722f
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-02-10T19:02:41Z

    [SPARK-12921] Fix another non-reflective TaskAttemptContext access in SpecificParquetRecordReaderBase
    
    This is a minor followup to #10843 to fix one remaining place where we forgot to use reflective access of TaskAttemptContext methods.
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #11131 from JoshRosen/SPARK-12921-take-2.

commit b57fac576f0033e8b43a89b4ada29901199aa29b
Author: raela <ra...@databricks.com>
Date:   2016-02-11T01:00:54Z

    [SPARK-13274] Fix Aggregator Links on GroupedDataset Scala API
    
    Update Aggregator links to point to #org.apache.spark.sql.expressions.Aggregator
    
    Author: raela <ra...@databricks.com>
    
    Closes #11158 from raelawang/master.
    
    (cherry picked from commit 719973b05ef6d6b9fbb83d76aebac6454ae84fad)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 91a5ca5e84497c37de98c194566a568117332710
Author: Yu ISHIKAWA <yu...@gmail.com>
Date:   2016-02-11T23:00:23Z

    [SPARK-13265][ML] Refactoring of basic ML import/export for other file system besides HDFS
    
    jkbradley I tried to improve the function to export a model. When I tried to export a model to S3 under Spark 1.6, we couldn't do that. So, it should offer S3 besides HDFS. Can you review it when you have time? Thanks!
    
    Author: Yu ISHIKAWA <yu...@gmail.com>
    
    Closes #11151 from yu-iskw/SPARK-13265.
    
    (cherry picked from commit efb65e09bcfa4542348f5cd37fe5c14047b862e5)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 9d45ec466a4067bb2d0b59ff1174bec630daa7b1
Author: sethah <se...@gmail.com>
Date:   2016-02-12T00:42:44Z

    [SPARK-13047][PYSPARK][ML] Pyspark Params.hasParam should not throw an error
    
    Pyspark Params class has a method `hasParam(paramName)` which returns `True` if the class has a parameter by that name, but throws an `AttributeError` otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of `hasParam` or add an additional method with this functionality.
    
    In Python:
    ```python
    from pyspark.ml.classification import NaiveBayes
    nb = NaiveBayes()
    print nb.hasParam("smoothing")
    print nb.hasParam("notAParam")
    ```
    produces:
    > True
    > AttributeError: 'NaiveBayes' object has no attribute 'notAParam'
    
    However, in Scala:
    ```scala
    import org.apache.spark.ml.classification.NaiveBayes
    val nb  = new NaiveBayes()
    nb.hasParam("smoothing")
    nb.hasParam("notAParam")
    ```
    produces:
    > true
    > false
    
    cc holdenk
    
    Author: sethah <se...@gmail.com>
    
    Closes #10962 from sethah/SPARK-13047.
    
    (cherry picked from commit b35467388612167f0bc3d17142c21a406f6c620d)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 18661a2bb527adbd01e98158696a16f6d8162411
Author: Tommy YU <tu...@163.com>
Date:   2016-02-12T02:38:49Z

    [SPARK-13153][PYSPARK] ML persistence failed when handle no default value parameter
    
    Fix this defect by check default value exist or not.
    
    yanboliang Please help to review.
    
    Author: Tommy YU <tu...@163.com>
    
    Closes #11043 from Wenpei/spark-13153-handle-param-withnodefaultvalue.
    
    (cherry picked from commit d3e2e202994e063856c192e9fdd0541777b88e0e)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 93a55f3df3c9527ecf4143cb40ac7212bc3a975a
Author: markpavey <ma...@thefilter.com>
Date:   2016-02-13T08:39:43Z

    [SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft Windows
    
    Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK.
    
    Is it worth considering also including this fix in any future 1.5.x releases (if any)?
    
    I confirm this is my own original work and license it to the Spark project under its open source license.
    
    Author: markpavey <ma...@thefilter.com>
    
    Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix.
    
    (cherry picked from commit 374c4b2869fc50570a68819cf0ece9b43ddeb34b)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 107290c94312524bfc4560ebe0de268be4ca56af
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2016-02-13T23:56:20Z

    [SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering failed test
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-12363
    
    This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work.
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    Author: Xiangrui Meng <me...@databricks.com>
    
    Closes #10539 from viirya/fix-poweriter.
    
    (cherry picked from commit e3441e3f68923224d5b576e6112917cf1fe1f89a)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit ec40c5a59fe45e49496db6e0082ddc65c937a857
Author: Amit Dev <am...@gmail.com>
Date:   2016-02-14T11:41:27Z

    [SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy
    
    Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps.
    
    Author: Amit Dev <am...@gmail.com>
    
    Closes #11180 from amitdev/master.
    
    (cherry picked from commit 331293c30242dc43e54a25171ca51a1c9330ae44)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 71f53edc0e39bc907755153b9603be8c6fcc1d93
Author: JeremyNixon <jn...@gmail.com>
Date:   2016-02-15T09:25:13Z

    [SPARK-13312][MLLIB] Update java train-validation-split example in ml-guide
    
    Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312.
    
    This contribution is my original work and I license the work to this project.
    
    Author: JeremyNixon <jn...@gmail.com>
    
    Closes #11199 from JeremyNixon/update_train_val_split_example.
    
    (cherry picked from commit adb548365012552e991d51740bfd3c25abf0adec)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit d95089190d714e3e95579ada84ac42d463f824b5
Author: Miles Yucht <mi...@databricks.com>
Date:   2016-02-16T13:01:21Z

    Correct SparseVector.parse documentation
    
    There's a small typo in the SparseVector.parse docstring (which says that it returns a DenseVector rather than a SparseVector), which seems to be incorrect.
    
    Author: Miles Yucht <mi...@databricks.com>
    
    Closes #11213 from mgyucht/fix-sparsevector-docs.
    
    (cherry picked from commit 827ed1c06785692d14857bd41f1fd94a0853874a)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 98354cae984e3719a49050e7a6aa75dae78b12bb
Author: Sital Kedia <sk...@fb.com>
Date:   2016-02-17T06:27:34Z

    [SPARK-13279] Remove O(n^2) operation from scheduler.
    
    This commit removes an unnecessary duplicate check in addPendingTask that meant
    that scheduling a task set took time proportional to (# tasks)^2.
    
    Author: Sital Kedia <sk...@fb.com>
    
    Closes #11175 from sitalkedia/fix_stuck_driver.
    
    (cherry picked from commit 1e1e31e03df14f2e7a9654e640fb2796cf059fe0)
    Signed-off-by: Kay Ousterhout <ka...@gmail.com>

commit 66106a660149607348b8e51994eb2ce29d67abc0
Author: Christopher C. Aycock <ch...@chrisaycock.com>
Date:   2016-02-17T19:24:18Z

    [SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's default is "python2.7"
    
    Author: Christopher C. Aycock <ch...@chrisaycock.com>
    
    Closes #11239 from chrisaycock/master.
    
    (cherry picked from commit a7c74d7563926573c01baf613708a0f105a03e57)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit 16f35c4c6e7e56bdb1402eab0877da6e8497cb3f
Author: Sean Owen <so...@cloudera.com>
Date:   2016-02-18T20:14:30Z

    [SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask compares Option and String directly.
    
    ## What changes were proposed in this pull request?
    
    Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager)
    
    ## How was the this patch tested?
    
    Running Jenkins tests
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #11253 from srowen/SPARK-13371.
    
    (cherry picked from commit 78562535feb6e214520b29e0bbdd4b1302f01e93)
    Signed-off-by: Andrew Or <an...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17206: Branch 1.6

Posted by klion26 <gi...@git.apache.org>.

Github user klion26 closed the pull request at:

    https://github.com/apache/spark/pull/17206


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org