You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GuoNing89 <gi...@git.apache.org> on 2016/05/14 10:21:06 UTC

[GitHub] spark pull request: Branch 1.4

GitHub user GuoNing89 opened a pull request:

    https://github.com/apache/spark/pull/13114

    Branch 1.4

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13114
    
----
commit 4634be5a7db4f2fd82cfb5c602b79129d1d9e246
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-06-14T16:34:35Z

    [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space in UnsafeFixedWidthAggregationMap
    
    UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when allocating row conversion scratch space: we take a size requirement, measured in bytes, then allocate a long array of that size.  This means that we end up allocating 8x too much conversion space.
    
    This patch fixes this by allocating a `byte[]` array instead.  This doesn't impose any new limitations on the maximum sizes of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows.
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits:
    
    6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is constrained by max byte[] size
    
    (cherry picked from commit ea7fd2ff6454e8d819a39bf49901074e49b5714e)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit 2805d145e30e4cabd11a7d33c4f80edbc54cc54a
Author: Michael Armbrust <mi...@databricks.com>
Date:   2015-06-14T18:21:42Z

    [SPARK-8358] [SQL] Wait for child resolution when resolving generators
    
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #6811 from marmbrus/aliasExplodeStar and squashes the following commits:
    
    fbd2065 [Michael Armbrust] more style
    806a373 [Michael Armbrust] fix style
    7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when resolving generatorsa
    
    (cherry picked from commit 9073a426e444e4bc6efa8608e54e0a986f38a270)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 0ffbf085190b9d4dc13a8b6545e4e1022083bd35
Author: Peter Hoffmann <ph...@peter-hoffmann.com>
Date:   2015-06-14T18:41:16Z

    fix read/write mixup
    
    Author: Peter Hoffmann <ph...@peter-hoffmann.com>
    
    Closes #6815 from hoffmann/patch-1 and squashes the following commits:
    
    2abb6da [Peter Hoffmann] fix read/write mixup
    
    (cherry picked from commit f3f2a4397da164f0ddfa5d60bf441099296c4346)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit fff8d7ee6c7e88ed96c29260480e8228e7fb1435
Author: tedyu <yu...@gmail.com>
Date:   2015-06-16T00:00:38Z

    SPARK-8336 Fix NullPointerException with functions.rand()
    
    This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'
    
    Tested using spark-shell and verified that the following works:
    sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()
    
    Author: tedyu <yu...@gmail.com>
    
    Closes #6793 from tedyu/master and squashes the following commits:
    
    62fd97b [tedyu] Create RandomSuite
    750f92c [tedyu] Add test for Rand() with seed
    a1d66c5 [tedyu] Fix NullPointerException with functions.rand()
    
    (cherry picked from commit 1a62d61696a0481508d83a07d19ab3701245ac20)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit f287f7ea141fa7a3e9f8b7d3a2180b63cd77088d
Author: huangzhaowei <ca...@gmail.com>
Date:   2015-06-16T06:16:09Z

    [SPARK-8367] [STREAMING] Add a limit for 'spark.streaming.blockInterval` since a data loss bug.
    
    Bug had reported in the jira [SPARK-8367](https://issues.apache.org/jira/browse/SPARK-8367)
    The relution is limitting the configuration `spark.streaming.blockInterval` to a positive number.
    
    Author: huangzhaowei <ca...@gmail.com>
    Author: huangzhaowei <Sa...@users.noreply.github.com>
    
    Closes #6818 from SaintBacchus/SPARK-8367 and squashes the following commits:
    
    c9d1927 [huangzhaowei] Update BlockGenerator.scala
    bd3f71a [huangzhaowei] Use requre instead of if
    3d17796 [huangzhaowei] [SPARK_8367][Streaming]Add a limit for 'spark.streaming.blockInterval' since a data loss bug.
    
    (cherry picked from commit ccf010f27bc62f7e7f409c6eef7488ab476de609)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 1378bdc4a9a974b40c7c509f4af7f07bdc892e14
Author: Moussa Taifi <mo...@gmail.com>
Date:   2015-06-16T19:59:22Z

    [SPARK-DOCS] [SPARK-SQL] Update sql-programming-guide.md
    
    Typo in thriftserver section
    
    Author: Moussa Taifi <mo...@gmail.com>
    
    Closes #6847 from moutai/patch-1 and squashes the following commits:
    
    1bd29df [Moussa Taifi] Update sql-programming-guide.md
    
    (cherry picked from commit dc455b88330f79b1181a585277ea9ed3e0763703)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 4da068650800bdf1fa488790049993896d0edc32
Author: Radek Ostrowski <de...@gmail.com>
Date:   2015-06-16T20:04:26Z

    [SQL] [DOC] improved a comment
    
    [SQL][DOC] I found it a bit confusing when I came across it for the first time in the docs
    
    Author: Radek Ostrowski <de...@gmail.com>
    Author: radek <ra...@radeks-MacBook-Pro-2.local>
    
    Closes #6332 from radek1st/master and squashes the following commits:
    
    dae3347 [Radek Ostrowski] fixed typo
    c76bb3a [radek] improved a comment
    
    (cherry picked from commit 4bd10fd5090fb5f4f139267b82e9f2fc15659796)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit b9e5d3cadd0f07c211623b045466220c39abdc56
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-06-16T20:10:18Z

    [SPARK-8126] [BUILD] Make sure temp dir exists when running tests.
    
    If you ran "clean" at the top-level sbt project, the temp dir would
    go away, so running "test" without restarting sbt would fail. This
    fixes that by making sure the temp dir exists before running tests.
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #6805 from vanzin/SPARK-8126-fix and squashes the following commits:
    
    12d7768 [Marcelo Vanzin] [SPARK-8126] [build] Make sure temp dir exists when running tests.
    
    (cherry picked from commit cebf2411847706a98dc8df9c754ef53d6d12a87c)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 15d973f2d9c2512dd5a882b6b65fb494de526643
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-06-16T21:30:30Z

    [SPARK-7916] [MLLIB] MLlib Python doc parity check for classification and regression
    
    Check then make the MLlib Python classification and regression doc to be as complete as the Scala doc.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #6460 from yanboliang/spark-7916 and squashes the following commits:
    
    f8deda4 [Yanbo Liang] trigger jenkins
    6dc4d99 [Yanbo Liang] address comments
    ce2a43e [Yanbo Liang] truncate too long line and remove extra sparse
    3eaf6ad [Yanbo Liang] MLlib Python doc parity check for classification and regression
    
    (cherry picked from commit ca998757e8ff2bdca2c7e88055c389161521d604)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit 877deb046862bff8200c517674f9e1100ab09b9a
Author: Punya Biswal <pb...@palantir.com>
Date:   2015-06-17T05:31:49Z

    Fix break introduced by backport
    
    rxin this is the fix you requested for the break introduced by backporting #6793
    
    Author: Punya Biswal <pb...@palantir.com>
    
    Closes #6850 from punya/feature/fix-backport-break and squashes the following commits:
    
    fdc3693 [Punya Biswal] Fix break introduced by backport

commit a5f602efcffea3da03f0cf828045b4e1b862fde8
Author: Vyacheslav Baranov <sl...@gmail.com>
Date:   2015-06-17T08:42:29Z

    [SPARK-8309] [CORE] Support for more than 12M items in OpenHashMap
    
    The problem occurs because the position mask `0xEFFFFFF` is incorrect. It has zero 25th bit, so when capacity grows beyond 2^24, `OpenHashMap` calculates incorrect index of value in `_values` array.
    
    I've also added a size check in `rehash()`, so that it fails instead of reporting invalid item indices.
    
    Author: Vyacheslav Baranov <sl...@gmail.com>
    
    Closes #6763 from SlavikBaranov/SPARK-8309 and squashes the following commits:
    
    8557445 [Vyacheslav Baranov] Resolved review comments
    4d5b954 [Vyacheslav Baranov] Resolved review comments
    eaf1e68 [Vyacheslav Baranov] Fixed failing test
    f9284fd [Vyacheslav Baranov] Resolved review comments
    3920656 [Vyacheslav Baranov] SPARK-8309: Support for more than 12M items in OpenHashMap
    
    (cherry picked from commit c13da20a55b80b8632d547240d2c8f97539969a1)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 320c4420b9cf5d1a4669dc3bb63c63f43dcd9079
Author: Sean Owen <so...@cloudera.com>
Date:   2015-06-17T20:31:10Z

    [SPARK-8395] [DOCS] start-slave.sh docs incorrect
    
    start-slave.sh no longer takes a worker # param in 1.4+
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #6855 from srowen/SPARK-8395 and squashes the following commits:
    
    300278e [Sean Owen] start-slave.sh no longer takes a worker # param in 1.4+
    
    (cherry picked from commit f005be02730db315e2a6d4dbecedfd2562b9ef1f)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit a7f6979d0fecec948c25427bdeb01b4fe296ca41
Author: Punya Biswal <pb...@palantir.com>
Date:   2015-06-17T20:37:20Z

    [SPARK-7515] [DOC] Update documentation for PySpark on YARN with cluster mode
    
    Now PySpark on YARN with cluster mode is supported so let's update doc.
    
    Author: Kousuke Saruta <sarutakoss.nttdata.co.jp>
    
    Closes #6040 from sarutak/update-doc-for-pyspark-on-yarn and squashes the following commits:
    
    ad9f88c [Kousuke Saruta] Brushed up sentences
    469fd2e [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into update-doc-for-pyspark-on-yarn
    fcfdb92 [Kousuke Saruta] Updated doc for PySpark on YARN with cluster mode
    
    Author: Punya Biswal <pb...@palantir.com>
    Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
    
    Closes #6842 from punya/feature/SPARK-7515 and squashes the following commits:
    
    0b83648 [Punya Biswal] Merge remote-tracking branch 'origin/branch-1.4' into feature/SPARK-7515
    de025cd [Kousuke Saruta] [SPARK-7515] [DOC] Update documentation for PySpark on YARN with cluster mode

commit d75c53d88d4d8d176975e499788a43dda2a62476
Author: Mingfei <mi...@intel.com>
Date:   2015-06-17T20:40:07Z

    [SPARK-8161] Set externalBlockStoreInitialized to be true, after ExternalBlockStore is initialized
    
    externalBlockStoreInitialized is never set to be true, which causes the blocks stored in ExternalBlockStore can not be removed.
    
    Author: Mingfei <mi...@intel.com>
    
    Closes #6702 from shimingfei/SetTrue and squashes the following commits:
    
    add61d8 [Mingfei] Set externalBlockStoreInitialized to be true, after ExternalBlockStore is initialized
    
    (cherry picked from commit 7ad8c5d869555b1bf4b50eafdf80e057a0175941)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit f0513733d4f6fc34f86feffd3062600cbbd56a28
Author: Carson Wang <ca...@intel.com>
Date:   2015-06-17T20:41:36Z

    [SPARK-8372] History server shows incorrect information for application not started
    
    The history server may show an incorrect App ID for an incomplete application like <App ID>.inprogress. This app info will never disappear even after the app is completed.
    ![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png)
    
    The cause of the issue is that a log path name is used as the app id when app id cannot be got during replay.
    
    Author: Carson Wang <ca...@intel.com>
    
    Closes #6827 from carsonwang/SPARK-8372 and squashes the following commits:
    
    cdbb089 [Carson Wang] Fix code style
    3e46b35 [Carson Wang] Update code style
    90f5dde [Carson Wang] Add a unit test
    d8c9cd0 [Carson Wang] Replaying events only return information when app is started
    
    (cherry picked from commit 2837e067099921dd4ab6639ac5f6e89f789d4ff4)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 5e7973df0ec21c4fd8ae0a26290088def231d26c
Author: zsxwing <zs...@gmail.com>
Date:   2015-06-17T20:59:39Z

    [SPARK-8373] [PYSPARK] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD
    
    This PR fixes the sum issue and also adds `emptyRDD` so that it's easy to create a test case.
    
    Author: zsxwing <zs...@gmail.com>
    
    Closes #6826 from zsxwing/python-emptyRDD and squashes the following commits:
    
    b36993f [zsxwing] Update the return type to JavaRDD[T]
    71df047 [zsxwing] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD
    
    (cherry picked from commit 0fc4b96f3e3bf81724ac133a6acc97c1b77271b4)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 5aedfa2ceb5f9a9d22994a5709f663ee6d9a607e
Author: zsxwing <zs...@gmail.com>
Date:   2015-06-17T22:00:03Z

    [SPARK-8404] [STREAMING] [TESTS] Use thread-safe collections to make the tests more reliable
    
    KafkaStreamSuite, DirectKafkaStreamSuite, JavaKafkaStreamSuite and JavaDirectKafkaStreamSuite use non-thread-safe collections to collect data in one thread and check it in another thread. It may fail the tests.
    
    This PR changes them to thread-safe collections.
    
    Note: I cannot reproduce the test failures in my environment. But at least, this PR should make the tests more reliable.
    
    Author: zsxwing <zs...@gmail.com>
    
    Closes #6852 from zsxwing/fix-KafkaStreamSuite and squashes the following commits:
    
    d464211 [zsxwing] Use thread-safe collections to make the tests more reliable
    
    (cherry picked from commit a06d9c8e76bb904d48764802aa3affff93b00baa)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit 73cf5def0687bbe556542646e2b1bd569c59cd59
Author: Yin Huai <yh...@databricks.com>
Date:   2015-06-17T21:52:43Z

    [SPARK-8306] [SQL] AddJar command needs to set the new class loader to the HiveConf inside executionHive.state.
    
    https://issues.apache.org/jira/browse/SPARK-8306
    
    I will try to add a test later.
    
    marmbrus aarondav
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #6758 from yhuai/SPARK-8306 and squashes the following commits:
    
    1292346 [Yin Huai] [SPARK-8306] AddJar command needs to set the new class loader to the HiveConf inside executionHive.state.
    
    (cherry picked from commit 302556ff999ba9a1960281de6932e0d904197204)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>
    
    Conflicts:
    	sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala

commit 67ad12d793a8f0f8137d0a2e0c0d80bd1b5284f2
Author: xutingjun <xu...@huawei.com>
Date:   2015-06-18T05:31:01Z

    [SPARK-8392] RDDOperationGraph: getting cached nodes is slow
    
    ```def getAllNodes: Seq[RDDOperationNode] =
    { _childNodes ++ _childClusters.flatMap(_.childNodes) }```
    
    when the ```_childClusters``` has so many nodes, the process will hang on. I think we can improve the efficiency here.
    
    Author: xutingjun <xu...@huawei.com>
    
    Closes #6839 from XuTingjun/DAGImprove and squashes the following commits:
    
    53b03ea [xutingjun] change code to more concise and easier to read
    f98728b [xutingjun] fix words: node -> nodes
    f87c663 [xutingjun] put the filter inside
    81f9fd2 [xutingjun] put the filter inside
    
    (cherry picked from commit e2cdb0568b14df29bbdb1ee9a13ee361c9ddad9c)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 9dabc129368aba7c1255328974bf849b4c3340c2
Author: Burak Yavuz <br...@gmail.com>
Date:   2015-06-18T05:33:37Z

    [SPARK-8095] Resolve dependencies of --packages in local ivy cache
    
    Dependencies of artifacts in the local ivy cache were not being resolved properly. The dependencies were not being picked up. Now they should be.
    
    cc andrewor14
    
    Author: Burak Yavuz <br...@gmail.com>
    
    Closes #6788 from brkyvz/local-ivy-fix and squashes the following commits:
    
    2875bf4 [Burak Yavuz] fix temp dir bug
    48cc648 [Burak Yavuz] improve deletion
    a69e3e6 [Burak Yavuz] delete cache before test as well
    0037197 [Burak Yavuz] fix merge conflicts
    f60772c [Burak Yavuz] use different folder for m2 cache during testing
    b6ef038 [Burak Yavuz] [SPARK-8095] Resolve dependencies of Spark Packages in local ivy cache
    
    Conflicts:
    	core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala

commit ca23c3b0147de9bcc22e3b9c7b74d20df6402137
Author: Davies Liu <da...@databricks.com>
Date:   2015-06-18T20:45:58Z

    [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark
    
    The batch size during external sort will grow up to max 10000, then shrink down to zero, causing infinite loop.
    Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill.
    
    cc JoshRosen rxin angelini
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #6714 from davies/batch_size and squashes the following commits:
    
    b170dfb [Davies Liu] update test
    b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark into batch_size
    6ade745 [Davies Liu] update test
    5c21777 [Davies Liu] Update shuffle.py
    e746aec [Davies Liu] fix batch size during sort

commit c1da5cf02983d04257f3a3b666a7755de1f79b36
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-06-18T22:10:09Z

    [SPARK-8353] [DOCS] Show anchor links when hovering over documentation headers
    
    This patch uses [AnchorJS](https://bryanbraun.github.io/anchorjs/) to show deep anchor links when hovering over headers in the Spark documentation. For example:
    
    ![image](https://cloud.githubusercontent.com/assets/50748/8240800/1502f85c-15ba-11e5-819a-97b231370a39.png)
    
    This makes it easier for users to link to specific sections of the documentation.
    
    I also removed some dead Javascript which isn't used in our current docs (it was introduced for the old AMPCamp training, but isn't used anymore).
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #6808 from JoshRosen/SPARK-8353 and squashes the following commits:
    
    e59d8a7 [Josh Rosen] Suppress underline on hover
    f518b6a [Josh Rosen] Turn on for all headers, since we use H1s in a bunch of places
    a9fec01 [Josh Rosen] Add anchor links when hovering over headers; remove some dead JS code
    
    (cherry picked from commit 44c931f006194a833f09517c9e35fb3cdf5852b1)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit 9f293a9eb69d4dac13683edcbd7286a56696cbbb
Author: zsxwing <zs...@gmail.com>
Date:   2015-06-18T23:00:27Z

    [SPARK-8376] [DOCS] Add common lang3 to the Spark Flume Sink doc
    
    Commons Lang 3 has been added as one of the dependencies of Spark Flume Sink since #5703. This PR updates the doc for it.
    
    Author: zsxwing <zs...@gmail.com>
    
    Closes #6829 from zsxwing/flume-sink-dep and squashes the following commits:
    
    f8617f0 [zsxwing] Add common lang3 to the Spark Flume Sink doc
    
    (cherry picked from commit 24e53793b4b100317d59ea16acb42f55d10a9575)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit 152f4465d38b3076ffccab662a8fa0a75ed513e8
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-06-18T23:45:14Z

    [SPARK-8446] [SQL] Add helper functions for testing SparkPlan physical operators
    
    This patch introduces `SparkPlanTest`, a base class for unit tests of SparkPlan physical operators.  This is analogous to Spark SQL's existing `QueryTest`, which does something similar for end-to-end tests with actual queries.
    
    These helper methods provide nicer error output when tests fail and help developers to avoid writing lots of boilerplate in order to execute manually constructed physical plans.
    
    Author: Josh Rosen <jo...@databricks.com>
    Author: Josh Rosen <ro...@gmail.com>
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #6885 from JoshRosen/spark-plan-test and squashes the following commits:
    
    f8ce275 [Josh Rosen] Fix some IntelliJ inspections and delete some dead code
    84214be [Josh Rosen] Add an extra column which isn't part of the sort
    ae1896b [Josh Rosen] Provide implicits automatically
    a80f9b0 [Josh Rosen] Merge pull request #4 from marmbrus/pr/6885
    d9ab1e4 [Michael Armbrust] Add simple resolver
    c60a44d [Josh Rosen] Manually bind references
    996332a [Josh Rosen] Add types so that tests compile
    a46144a [Josh Rosen] WIP
    
    (cherry picked from commit 207a98ca59757d9cdd033d0f72863ad9ffb4e4b9)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit bd9bbd61197ae7164ed93e70f00e82d832902404
Author: Lars Francke <la...@gmail.com>
Date:   2015-06-19T02:40:32Z

    [SPARK-8462] [DOCS] Documentation fixes for Spark SQL
    
    This fixes various minor documentation issues on the Spark SQL page
    
    Author: Lars Francke <la...@gmail.com>
    
    Closes #6890 from lfrancke/SPARK-8462 and squashes the following commits:
    
    dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462
    34eff2c [Lars Francke] Minor documentation fixes
    
    (cherry picked from commit 4ce3bab89f6bdf6208fdad2fbfaba0b53d1954e3)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit b55e4b9a5254c1e25075f382f1d337f0c1ba8554
Author: Dibyendu Bhattacharya <di...@pearson.com>
Date:   2015-06-19T02:58:47Z

    [SPARK-8080] [STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
    
    tdas  zsxwing this is the new PR for Spark-8080
    
    I have merged https://github.com/apache/spark/pull/6659
    
    Also to mention , for MEMORY_ONLY settings , when Block is not able to unrollSafely to memory if enough space is not there, BlockManager won't try to put the block and ReceivedBlockHandler will throw SparkException as it could not find the block id in PutResult. Thus number of records in block won't be counted if Block failed to unroll in memory. Which is fine.
    
    For MEMORY_DISK settings , if BlockManager not able to unroll block to memory, block will still get deseralized to Disk. Same for WAL based store. So for those cases ( storage level = memory + disk )  number of records will be counted even though the block not able to unroll to memory.
    
    thus I added the isFullyConsumed in the CountingIterator but have not used it as such case will never happen that block not fully consumed and ReceivedBlockHandler still get the block ID.
    
    I have added few test cases to cover those block unrolling scenarios also.
    
    Author: Dibyendu Bhattacharya <di...@pearson.com>
    Author: U-PEROOT\UBHATD1 <UB...@PIN-L-PI046.PEROOT.com>
    
    Closes #6707 from dibbhatt/master and squashes the following commits:
    
    f6cb6b5 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
    f37cfd8 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
    5a8344a [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI Count ByteBufferBlock as 1 count
    fceac72 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
    0153e7e [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI Fixed comments given by @zsxwing
    4c5931d [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
    01e6dc8 [U-PEROOT\UBHATD1] A

commit f48f3a2e2fc1ceb4b6672bc4122e783abb626b6e
Author: Cheng Lian <li...@databricks.com>
Date:   2015-06-19T05:01:52Z

    [SPARK-8458] [SQL] Don't strip scheme part of output path when writing ORC files
    
    `Path.toUri.getPath` strips scheme part of output path (from `file:///foo` to `/foo`), which causes ORC data source only writes to the file system configured in Hadoop configuration. Should use `Path.toString` instead.
    
    Author: Cheng Lian <li...@databricks.com>
    
    Closes #6892 from liancheng/spark-8458 and squashes the following commits:
    
    87f8199 [Cheng Lian] Don't strip scheme of output path when writing ORC files
    
    (cherry picked from commit a71cbbdea581573192a59bf8472861c463c40fcb)
    Signed-off-by: Cheng Lian <li...@databricks.com>

commit 164b9d32e764b2a67b372a3d685b57c4bbeccbfa
Author: Kevin Conor <ke...@discoverybayconsulting.com>
Date:   2015-06-19T07:12:20Z

    [SPARK-8339] [PYSPARK] integer division for python 3
    
    Itertools islice requires an integer for the stop argument.  Switching to integer division here prevents a ValueError when vs is evaluated above.
    
    davies
    
    This is my original work, and I license it to the project.
    
    Author: Kevin Conor <ke...@discoverybayconsulting.com>
    
    Closes #6794 from kconor/kconor-patch-1 and squashes the following commits:
    
    da5e700 [Kevin Conor] Integer division for batch size
    
    (cherry picked from commit fdf63f12490c674cc1877ddf7b70343c4fd6f4f1)
    Signed-off-by: Davies Liu <da...@databricks.com>

commit 1f2dafb77f9af52602885cd5767032a20b486b98
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-06-19T16:46:51Z

    [SPARK-8151] [MLLIB] pipeline components should correctly implement copy
    
    Otherwise, extra params get ignored in `PipelineModel.transform`. jkbradley
    
    Author: Xiangrui Meng <me...@databricks.com>
    
    Closes #6622 from mengxr/SPARK-8087 and squashes the following commits:
    
    0e4c8c4 [Xiangrui Meng] fix merge issues
    26fc1f0 [Xiangrui Meng] address comments
    e607a04 [Xiangrui Meng] merge master
    b85b57e [Xiangrui Meng] fix examples/compile
    d6f7891 [Xiangrui Meng] rename defaultCopyWithParams to defaultCopy
    84ec278 [Xiangrui Meng] remove setter checks due to generics
    2cf2ed0 [Xiangrui Meng] snapshot
    291814f [Xiangrui Meng] OneVsRest.copy
    1dfe3bd [Xiangrui Meng] PipelineModel.copy should copy stages
    
    (cherry picked from commit 43c7ec6384e51105dedf3a53354b6a3732cc27b2)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit 6f2e41108437b23e4b8cdcfc500f8fb2babf92c6
Author: Lianhui Wang <li...@gmail.com>
Date:   2015-06-19T17:47:07Z

    [SPARK-8430] ExternalShuffleBlockResolver of shuffle service should support UnsafeShuffleManager
    
    andrewor14 can you take a look?thanks
    
    Author: Lianhui Wang <li...@gmail.com>
    
    Closes #6873 from lianhuiwang/SPARK-8430 and squashes the following commits:
    
    51c47ca [Lianhui Wang] update andrewor's comments
    2b27b19 [Lianhui Wang] support UnsafeShuffleManager
    
    (cherry picked from commit 9baf093014a48c5ec49f747773f4500dafdfa4ec)
    Signed-off-by: Andrew Or <an...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13114: Branch 1.4

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/13114
  
    @srowen @vanzin - Shouldn't some automated process be picking up your comments ("close this PR") and closing this PR? I thought we had something like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13114: Branch 1.4

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/13114
  
    @nchammas Yeah the script or process for that disappeared .. but all it would do is push an empty commit with "closes xxx" in its message. I can do that manually without much work. One moment ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.4

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13114#issuecomment-219212714
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.4

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/13114#issuecomment-219212846
  
    Close this PR @GuoNing89 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13114: Branch 1.4

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/13114
  
    @GuoNing89  please close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13114: Branch 1.4

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13114


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org