You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by liumingning <gi...@git.apache.org> on 2016/02/22 14:27:18 UTC

[GitHub] spark pull request: Branch 1.4

GitHub user liumingning opened a pull request:

    https://github.com/apache/spark/pull/11302

    Branch 1.4

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was the this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11302
    
----
commit 3a62569afb8fcd3d1610b4ede0f2c5e595acb9b9
Author: Shivaram Venkataraman <sh...@cs.berkeley.edu>
Date:   2015-06-11T20:22:08Z

    [SPARK-8310] [EC2] Update spark-ec2 branch to 1.4
    
    cc pwendell  -- We should probably update our release guidelines to change this when we cut a release branch ?
    
    Author: Shivaram Venkataraman <sh...@cs.berkeley.edu>
    
    Closes #6765 from shivaram/SPARK-8310-14 and squashes the following commits:
    
    066e44e [Shivaram Venkataraman] Update spark-ec2 branch to 1.4

commit 8b25f62bf19b02042675aa1d4e4b58cc4deb3e26
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-06-11T22:29:03Z

    [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #6766 from vanzin/SPARK-6511 and squashes the following commits:
    
    49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
    
    (cherry picked from commit 9cbdf31ec1399d4d43a1863c15688ce78b6dfd92)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 141eab71ee3aa05da899ecfc6bae40b3798a4665
Author: Mark Smith <ma...@bronto.com>
Date:   2015-06-12T17:28:30Z

    [SPARK-8322] [EC2] Added spark 1.4.0 into the VALID_SPARK_VERSIONS and…
    
    … SPARK_TACHYON_MAP
    
    Author: Mark Smith <ma...@bronto.com>
    
    Closes #6777 from markmsmith/branch-1.4 and squashes the following commits:
    
    a218cfa [Mark Smith] [SPARK-8322][EC2] Fixed tachyon mapp entry to point to 0.6.4
    90d1655 [Mark Smith] [SPARK-8322][EC2] Added spark 1.4.0 into the VALID_SPARK_VERSIONS and SPARK_TACHYON_MAP

commit 76083734196a7571de314df79e88759b650ed1f3
Author: Andrew Or <an...@databricks.com>
Date:   2015-06-12T18:14:55Z

    [SPARK-8330] DAG visualization: trim whitespace from input
    
    Safeguard against DOM rewriting.
    
    Author: Andrew Or <an...@databricks.com>
    
    Closes #6787 from andrewor14/dag-viz-trim and squashes the following commits:
    
    0fb4afe [Andrew Or] Trim input metadata from DOM
    
    (cherry picked from commit 88604051511c788d7abb41a49e3eb3a8330c09a9)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 7c11ccf3913ac6a5d178994704d8b0983829b43b
Author: Tathagata Das <ta...@gmail.com>
Date:   2015-06-12T22:22:59Z

    [SPARK-7284] [STREAMING] Updated streaming documentation
    
    - Kinesis API updated
    - Kafka version updated, and Python API for Direct Kafka added
    - Added SQLContext.getOrCreate()
    - Added information on how to get partitionId in foreachRDD
    
    Author: Tathagata Das <ta...@gmail.com>
    
    Closes #6781 from tdas/SPARK-7284 and squashes the following commits:
    
    aac7be0 [Tathagata Das] Added information on how to get partition id
    a66ec22 [Tathagata Das] Complete the line incomplete line,
    a92ca39 [Tathagata Das] Updated streaming documentation
    
    (cherry picked from commit e9471d3414d327c7d0853e18f1844ab1bd09c8ed)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit 1ca431e83f070f9737b4cc3b7918188ad5dd3d36
Author: Michael Armbrust <mi...@databricks.com>
Date:   2015-06-13T06:11:16Z

    [SPARK-8329][SQL] Allow _ in DataSource options
    
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #6786 from marmbrus/optionsParser and squashes the following commits:
    
    e7d18ef [Michael Armbrust] add dots
    99a3452 [Michael Armbrust] [SPARK-8329][SQL] Allow _ in DataSource options
    
    (cherry picked from commit 4aed66f299a67f5a594da9316b6bf4c345838216)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 187a3d5385e778c188d0c1c2adc755ac2d25e8e8
Author: Mike Dusenberry <du...@gmail.com>
Date:   2015-06-14T04:22:46Z

    [Spark-8343] [Streaming] [Docs] Improve Spark Streaming Guides.
    
    This improves the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc.
    
    Author: Mike Dusenberry <du...@gmail.com>
    
    Closes #6801 from dusenberrymw/SPARK-8343_Improve_Spark_Streaming_Guides_MERGED and squashes the following commits:
    
    6688090 [Mike Dusenberry] Improvements to the Spark Streaming Custom Receiver Guide, including slight rewording of confusing sections, and fixing typos & missing words.
    436fbd8 [Mike Dusenberry] Bunch of improvements to the Spark Streaming Guide, including fixing broken links, slight rewording of confusing sections, fixing typos & missing words, etc.
    
    (cherry picked from commit 35d1267cf8e918032c92a206b22bb301bf0c806e)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 4634be5a7db4f2fd82cfb5c602b79129d1d9e246
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-06-14T16:34:35Z

    [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space in UnsafeFixedWidthAggregationMap
    
    UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when allocating row conversion scratch space: we take a size requirement, measured in bytes, then allocate a long array of that size.  This means that we end up allocating 8x too much conversion space.
    
    This patch fixes this by allocating a `byte[]` array instead.  This doesn't impose any new limitations on the maximum sizes of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows.
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits:
    
    6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is constrained by max byte[] size
    
    (cherry picked from commit ea7fd2ff6454e8d819a39bf49901074e49b5714e)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit 2805d145e30e4cabd11a7d33c4f80edbc54cc54a
Author: Michael Armbrust <mi...@databricks.com>
Date:   2015-06-14T18:21:42Z

    [SPARK-8358] [SQL] Wait for child resolution when resolving generators
    
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #6811 from marmbrus/aliasExplodeStar and squashes the following commits:
    
    fbd2065 [Michael Armbrust] more style
    806a373 [Michael Armbrust] fix style
    7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when resolving generatorsa
    
    (cherry picked from commit 9073a426e444e4bc6efa8608e54e0a986f38a270)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 0ffbf085190b9d4dc13a8b6545e4e1022083bd35
Author: Peter Hoffmann <ph...@peter-hoffmann.com>
Date:   2015-06-14T18:41:16Z

    fix read/write mixup
    
    Author: Peter Hoffmann <ph...@peter-hoffmann.com>
    
    Closes #6815 from hoffmann/patch-1 and squashes the following commits:
    
    2abb6da [Peter Hoffmann] fix read/write mixup
    
    (cherry picked from commit f3f2a4397da164f0ddfa5d60bf441099296c4346)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit fff8d7ee6c7e88ed96c29260480e8228e7fb1435
Author: tedyu <yu...@gmail.com>
Date:   2015-06-16T00:00:38Z

    SPARK-8336 Fix NullPointerException with functions.rand()
    
    This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'
    
    Tested using spark-shell and verified that the following works:
    sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()
    
    Author: tedyu <yu...@gmail.com>
    
    Closes #6793 from tedyu/master and squashes the following commits:
    
    62fd97b [tedyu] Create RandomSuite
    750f92c [tedyu] Add test for Rand() with seed
    a1d66c5 [tedyu] Fix NullPointerException with functions.rand()
    
    (cherry picked from commit 1a62d61696a0481508d83a07d19ab3701245ac20)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit f287f7ea141fa7a3e9f8b7d3a2180b63cd77088d
Author: huangzhaowei <ca...@gmail.com>
Date:   2015-06-16T06:16:09Z

    [SPARK-8367] [STREAMING] Add a limit for 'spark.streaming.blockInterval` since a data loss bug.
    
    Bug had reported in the jira [SPARK-8367](https://issues.apache.org/jira/browse/SPARK-8367)
    The relution is limitting the configuration `spark.streaming.blockInterval` to a positive number.
    
    Author: huangzhaowei <ca...@gmail.com>
    Author: huangzhaowei <Sa...@users.noreply.github.com>
    
    Closes #6818 from SaintBacchus/SPARK-8367 and squashes the following commits:
    
    c9d1927 [huangzhaowei] Update BlockGenerator.scala
    bd3f71a [huangzhaowei] Use requre instead of if
    3d17796 [huangzhaowei] [SPARK_8367][Streaming]Add a limit for 'spark.streaming.blockInterval' since a data loss bug.
    
    (cherry picked from commit ccf010f27bc62f7e7f409c6eef7488ab476de609)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 1378bdc4a9a974b40c7c509f4af7f07bdc892e14
Author: Moussa Taifi <mo...@gmail.com>
Date:   2015-06-16T19:59:22Z

    [SPARK-DOCS] [SPARK-SQL] Update sql-programming-guide.md
    
    Typo in thriftserver section
    
    Author: Moussa Taifi <mo...@gmail.com>
    
    Closes #6847 from moutai/patch-1 and squashes the following commits:
    
    1bd29df [Moussa Taifi] Update sql-programming-guide.md
    
    (cherry picked from commit dc455b88330f79b1181a585277ea9ed3e0763703)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 4da068650800bdf1fa488790049993896d0edc32
Author: Radek Ostrowski <de...@gmail.com>
Date:   2015-06-16T20:04:26Z

    [SQL] [DOC] improved a comment
    
    [SQL][DOC] I found it a bit confusing when I came across it for the first time in the docs
    
    Author: Radek Ostrowski <de...@gmail.com>
    Author: radek <ra...@radeks-MacBook-Pro-2.local>
    
    Closes #6332 from radek1st/master and squashes the following commits:
    
    dae3347 [Radek Ostrowski] fixed typo
    c76bb3a [radek] improved a comment
    
    (cherry picked from commit 4bd10fd5090fb5f4f139267b82e9f2fc15659796)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit b9e5d3cadd0f07c211623b045466220c39abdc56
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-06-16T20:10:18Z

    [SPARK-8126] [BUILD] Make sure temp dir exists when running tests.
    
    If you ran "clean" at the top-level sbt project, the temp dir would
    go away, so running "test" without restarting sbt would fail. This
    fixes that by making sure the temp dir exists before running tests.
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #6805 from vanzin/SPARK-8126-fix and squashes the following commits:
    
    12d7768 [Marcelo Vanzin] [SPARK-8126] [build] Make sure temp dir exists when running tests.
    
    (cherry picked from commit cebf2411847706a98dc8df9c754ef53d6d12a87c)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 15d973f2d9c2512dd5a882b6b65fb494de526643
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-06-16T21:30:30Z

    [SPARK-7916] [MLLIB] MLlib Python doc parity check for classification and regression
    
    Check then make the MLlib Python classification and regression doc to be as complete as the Scala doc.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #6460 from yanboliang/spark-7916 and squashes the following commits:
    
    f8deda4 [Yanbo Liang] trigger jenkins
    6dc4d99 [Yanbo Liang] address comments
    ce2a43e [Yanbo Liang] truncate too long line and remove extra sparse
    3eaf6ad [Yanbo Liang] MLlib Python doc parity check for classification and regression
    
    (cherry picked from commit ca998757e8ff2bdca2c7e88055c389161521d604)
    Signed-off-by: Joseph K. Bradley <jo...@databricks.com>

commit 877deb046862bff8200c517674f9e1100ab09b9a
Author: Punya Biswal <pb...@palantir.com>
Date:   2015-06-17T05:31:49Z

    Fix break introduced by backport
    
    rxin this is the fix you requested for the break introduced by backporting #6793
    
    Author: Punya Biswal <pb...@palantir.com>
    
    Closes #6850 from punya/feature/fix-backport-break and squashes the following commits:
    
    fdc3693 [Punya Biswal] Fix break introduced by backport

commit a5f602efcffea3da03f0cf828045b4e1b862fde8
Author: Vyacheslav Baranov <sl...@gmail.com>
Date:   2015-06-17T08:42:29Z

    [SPARK-8309] [CORE] Support for more than 12M items in OpenHashMap
    
    The problem occurs because the position mask `0xEFFFFFF` is incorrect. It has zero 25th bit, so when capacity grows beyond 2^24, `OpenHashMap` calculates incorrect index of value in `_values` array.
    
    I've also added a size check in `rehash()`, so that it fails instead of reporting invalid item indices.
    
    Author: Vyacheslav Baranov <sl...@gmail.com>
    
    Closes #6763 from SlavikBaranov/SPARK-8309 and squashes the following commits:
    
    8557445 [Vyacheslav Baranov] Resolved review comments
    4d5b954 [Vyacheslav Baranov] Resolved review comments
    eaf1e68 [Vyacheslav Baranov] Fixed failing test
    f9284fd [Vyacheslav Baranov] Resolved review comments
    3920656 [Vyacheslav Baranov] SPARK-8309: Support for more than 12M items in OpenHashMap
    
    (cherry picked from commit c13da20a55b80b8632d547240d2c8f97539969a1)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 320c4420b9cf5d1a4669dc3bb63c63f43dcd9079
Author: Sean Owen <so...@cloudera.com>
Date:   2015-06-17T20:31:10Z

    [SPARK-8395] [DOCS] start-slave.sh docs incorrect
    
    start-slave.sh no longer takes a worker # param in 1.4+
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #6855 from srowen/SPARK-8395 and squashes the following commits:
    
    300278e [Sean Owen] start-slave.sh no longer takes a worker # param in 1.4+
    
    (cherry picked from commit f005be02730db315e2a6d4dbecedfd2562b9ef1f)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit a7f6979d0fecec948c25427bdeb01b4fe296ca41
Author: Punya Biswal <pb...@palantir.com>
Date:   2015-06-17T20:37:20Z

    [SPARK-7515] [DOC] Update documentation for PySpark on YARN with cluster mode
    
    Now PySpark on YARN with cluster mode is supported so let's update doc.
    
    Author: Kousuke Saruta <sarutakoss.nttdata.co.jp>
    
    Closes #6040 from sarutak/update-doc-for-pyspark-on-yarn and squashes the following commits:
    
    ad9f88c [Kousuke Saruta] Brushed up sentences
    469fd2e [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into update-doc-for-pyspark-on-yarn
    fcfdb92 [Kousuke Saruta] Updated doc for PySpark on YARN with cluster mode
    
    Author: Punya Biswal <pb...@palantir.com>
    Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
    
    Closes #6842 from punya/feature/SPARK-7515 and squashes the following commits:
    
    0b83648 [Punya Biswal] Merge remote-tracking branch 'origin/branch-1.4' into feature/SPARK-7515
    de025cd [Kousuke Saruta] [SPARK-7515] [DOC] Update documentation for PySpark on YARN with cluster mode

commit d75c53d88d4d8d176975e499788a43dda2a62476
Author: Mingfei <mi...@intel.com>
Date:   2015-06-17T20:40:07Z

    [SPARK-8161] Set externalBlockStoreInitialized to be true, after ExternalBlockStore is initialized
    
    externalBlockStoreInitialized is never set to be true, which causes the blocks stored in ExternalBlockStore can not be removed.
    
    Author: Mingfei <mi...@intel.com>
    
    Closes #6702 from shimingfei/SetTrue and squashes the following commits:
    
    add61d8 [Mingfei] Set externalBlockStoreInitialized to be true, after ExternalBlockStore is initialized
    
    (cherry picked from commit 7ad8c5d869555b1bf4b50eafdf80e057a0175941)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit f0513733d4f6fc34f86feffd3062600cbbd56a28
Author: Carson Wang <ca...@intel.com>
Date:   2015-06-17T20:41:36Z

    [SPARK-8372] History server shows incorrect information for application not started
    
    The history server may show an incorrect App ID for an incomplete application like <App ID>.inprogress. This app info will never disappear even after the app is completed.
    ![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png)
    
    The cause of the issue is that a log path name is used as the app id when app id cannot be got during replay.
    
    Author: Carson Wang <ca...@intel.com>
    
    Closes #6827 from carsonwang/SPARK-8372 and squashes the following commits:
    
    cdbb089 [Carson Wang] Fix code style
    3e46b35 [Carson Wang] Update code style
    90f5dde [Carson Wang] Add a unit test
    d8c9cd0 [Carson Wang] Replaying events only return information when app is started
    
    (cherry picked from commit 2837e067099921dd4ab6639ac5f6e89f789d4ff4)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 5e7973df0ec21c4fd8ae0a26290088def231d26c
Author: zsxwing <zs...@gmail.com>
Date:   2015-06-17T20:59:39Z

    [SPARK-8373] [PYSPARK] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD
    
    This PR fixes the sum issue and also adds `emptyRDD` so that it's easy to create a test case.
    
    Author: zsxwing <zs...@gmail.com>
    
    Closes #6826 from zsxwing/python-emptyRDD and squashes the following commits:
    
    b36993f [zsxwing] Update the return type to JavaRDD[T]
    71df047 [zsxwing] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD
    
    (cherry picked from commit 0fc4b96f3e3bf81724ac133a6acc97c1b77271b4)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 5aedfa2ceb5f9a9d22994a5709f663ee6d9a607e
Author: zsxwing <zs...@gmail.com>
Date:   2015-06-17T22:00:03Z

    [SPARK-8404] [STREAMING] [TESTS] Use thread-safe collections to make the tests more reliable
    
    KafkaStreamSuite, DirectKafkaStreamSuite, JavaKafkaStreamSuite and JavaDirectKafkaStreamSuite use non-thread-safe collections to collect data in one thread and check it in another thread. It may fail the tests.
    
    This PR changes them to thread-safe collections.
    
    Note: I cannot reproduce the test failures in my environment. But at least, this PR should make the tests more reliable.
    
    Author: zsxwing <zs...@gmail.com>
    
    Closes #6852 from zsxwing/fix-KafkaStreamSuite and squashes the following commits:
    
    d464211 [zsxwing] Use thread-safe collections to make the tests more reliable
    
    (cherry picked from commit a06d9c8e76bb904d48764802aa3affff93b00baa)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit 73cf5def0687bbe556542646e2b1bd569c59cd59
Author: Yin Huai <yh...@databricks.com>
Date:   2015-06-17T21:52:43Z

    [SPARK-8306] [SQL] AddJar command needs to set the new class loader to the HiveConf inside executionHive.state.
    
    https://issues.apache.org/jira/browse/SPARK-8306
    
    I will try to add a test later.
    
    marmbrus aarondav
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #6758 from yhuai/SPARK-8306 and squashes the following commits:
    
    1292346 [Yin Huai] [SPARK-8306] AddJar command needs to set the new class loader to the HiveConf inside executionHive.state.
    
    (cherry picked from commit 302556ff999ba9a1960281de6932e0d904197204)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>
    
    Conflicts:
    	sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala

commit 67ad12d793a8f0f8137d0a2e0c0d80bd1b5284f2
Author: xutingjun <xu...@huawei.com>
Date:   2015-06-18T05:31:01Z

    [SPARK-8392] RDDOperationGraph: getting cached nodes is slow
    
    ```def getAllNodes: Seq[RDDOperationNode] =
    { _childNodes ++ _childClusters.flatMap(_.childNodes) }```
    
    when the ```_childClusters``` has so many nodes, the process will hang on. I think we can improve the efficiency here.
    
    Author: xutingjun <xu...@huawei.com>
    
    Closes #6839 from XuTingjun/DAGImprove and squashes the following commits:
    
    53b03ea [xutingjun] change code to more concise and easier to read
    f98728b [xutingjun] fix words: node -> nodes
    f87c663 [xutingjun] put the filter inside
    81f9fd2 [xutingjun] put the filter inside
    
    (cherry picked from commit e2cdb0568b14df29bbdb1ee9a13ee361c9ddad9c)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 9dabc129368aba7c1255328974bf849b4c3340c2
Author: Burak Yavuz <br...@gmail.com>
Date:   2015-06-18T05:33:37Z

    [SPARK-8095] Resolve dependencies of --packages in local ivy cache
    
    Dependencies of artifacts in the local ivy cache were not being resolved properly. The dependencies were not being picked up. Now they should be.
    
    cc andrewor14
    
    Author: Burak Yavuz <br...@gmail.com>
    
    Closes #6788 from brkyvz/local-ivy-fix and squashes the following commits:
    
    2875bf4 [Burak Yavuz] fix temp dir bug
    48cc648 [Burak Yavuz] improve deletion
    a69e3e6 [Burak Yavuz] delete cache before test as well
    0037197 [Burak Yavuz] fix merge conflicts
    f60772c [Burak Yavuz] use different folder for m2 cache during testing
    b6ef038 [Burak Yavuz] [SPARK-8095] Resolve dependencies of Spark Packages in local ivy cache
    
    Conflicts:
    	core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala

commit ca23c3b0147de9bcc22e3b9c7b74d20df6402137
Author: Davies Liu <da...@databricks.com>
Date:   2015-06-18T20:45:58Z

    [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark
    
    The batch size during external sort will grow up to max 10000, then shrink down to zero, causing infinite loop.
    Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill.
    
    cc JoshRosen rxin angelini
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #6714 from davies/batch_size and squashes the following commits:
    
    b170dfb [Davies Liu] update test
    b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark into batch_size
    6ade745 [Davies Liu] update test
    5c21777 [Davies Liu] Update shuffle.py
    e746aec [Davies Liu] fix batch size during sort

commit c1da5cf02983d04257f3a3b666a7755de1f79b36
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-06-18T22:10:09Z

    [SPARK-8353] [DOCS] Show anchor links when hovering over documentation headers
    
    This patch uses [AnchorJS](https://bryanbraun.github.io/anchorjs/) to show deep anchor links when hovering over headers in the Spark documentation. For example:
    
    ![image](https://cloud.githubusercontent.com/assets/50748/8240800/1502f85c-15ba-11e5-819a-97b231370a39.png)
    
    This makes it easier for users to link to specific sections of the documentation.
    
    I also removed some dead Javascript which isn't used in our current docs (it was introduced for the old AMPCamp training, but isn't used anymore).
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #6808 from JoshRosen/SPARK-8353 and squashes the following commits:
    
    e59d8a7 [Josh Rosen] Suppress underline on hover
    f518b6a [Josh Rosen] Turn on for all headers, since we use H1s in a bunch of places
    a9fec01 [Josh Rosen] Add anchor links when hovering over headers; remove some dead JS code
    
    (cherry picked from commit 44c931f006194a833f09517c9e35fb3cdf5852b1)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit 9f293a9eb69d4dac13683edcbd7286a56696cbbb
Author: zsxwing <zs...@gmail.com>
Date:   2015-06-18T23:00:27Z

    [SPARK-8376] [DOCS] Add common lang3 to the Spark Flume Sink doc
    
    Commons Lang 3 has been added as one of the dependencies of Spark Flume Sink since #5703. This PR updates the doc for it.
    
    Author: zsxwing <zs...@gmail.com>
    
    Closes #6829 from zsxwing/flume-sink-dep and squashes the following commits:
    
    f8617f0 [zsxwing] Add common lang3 to the Spark Flume Sink doc
    
    (cherry picked from commit 24e53793b4b100317d59ea16acb42f55d10a9575)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.4

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11302#issuecomment-187395048
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.4

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/11302#issuecomment-190355445
  
    @liumingning Please close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.4

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11302


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.4

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/11302#issuecomment-187174791
  
    @liumingning could you please close this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org