You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by lisendong <gi...@git.apache.org> on 2015/04/08 13:33:50 UTC

[GitHub] spark pull request: Lr graphx sgd

GitHub user lisendong opened a pull request:

    https://github.com/apache/spark/pull/5420

    Lr graphx sgd

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/witgo/spark lrGraphxSGD

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5420
    
----
commit 791df93cd27e80324886279ea456318cd3b3443e
Author: Kay Ousterhout <ka...@gmail.com>
Date:   2015-02-25T22:55:24Z

    [SPARK-5982] Remove incorrect Local Read Time Metric
    
    This metric is incomplete, because the files are memory mapped, so much of the read from disk occurs later as tasks actually read the file's data.
    
    This should be merged into 1.3, so that we never expose this incorrect metric to users.
    
    CC pwendell ksakellis sryza
    
    Author: Kay Ousterhout <ka...@gmail.com>
    
    Closes #4749 from kayousterhout/SPARK-5982 and squashes the following commits:
    
    9737b5e [Kay Ousterhout] More fixes
    a1eb300 [Kay Ousterhout] Removed one more use of local read time
    cf13497 [Kay Ousterhout] [SPARK-5982] Remove incorrectwq Local Read Time Metric
    
    (cherry picked from commit 838a48036c050cef03b8c3620e16b5495cd7beab)
    Signed-off-by: Kay Ousterhout <ka...@gmail.com>

commit 9aca3c688ad9a51bc8e14053ed4daac168028913
Author: Davies Liu <da...@databricks.com>
Date:   2015-02-25T23:13:34Z

    [SPARK-5944] [PySpark] fix version in Python API docs
    
    use RELEASE_VERSION when building the Python API docs
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #4731 from davies/api_version and squashes the following commits:
    
    c9744c9 [Davies Liu] Update create-release.sh
    08cbc3f [Davies Liu] fix python docs
    
    (cherry picked from commit f3f4c87b3d944c10d1200dfe49091ebb2a149be6)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 016f1f81cb14ea54e0968a2fa4a3ea6df6d3a926
Author: Cheng Lian <li...@databricks.com>
Date:   2015-02-25T23:15:22Z

    [SPARK-6010] [SQL] Merging compatible Parquet schemas before computing splits
    
    `ReadContext.init` calls `InitContext.getMergedKeyValueMetadata`, which doesn't know how to merge conflicting user defined key-value metadata and throws exception. In our case, when dealing with different but compatible schemas, we have different Spark SQL schema JSON strings in different Parquet part-files, thus causes this problem. Reading similar Parquet files generated by Hive doesn't suffer from this issue.
    
    In this PR, we manually merge the schemas before passing it to `ReadContext` to avoid the exception.
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4768)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <li...@databricks.com>
    
    Closes #4768 from liancheng/spark-6010 and squashes the following commits:
    
    9002f0a [Cheng Lian] Fixes SPARK-6010
    
    (cherry picked from commit e0fdd467e277867d6bec5c6605cc1cabce70ac89)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 6fff9b8723085033b88103646ae28cafbecbf62f
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2015-02-25T23:22:33Z

    [SPARK-5999][SQL] Remove duplicate Literal matching block
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    
    Closes #4760 from viirya/dup_literal and squashes the following commits:
    
    06e7516 [Liang-Chi Hsieh] Remove duplicate Literal matching block.
    
    (cherry picked from commit 12dbf98c5d270e3846e946592666160b1541d9dc)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 5bd4b499a51b825419d9b0e61bba3050f53e8ab0
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-02-25T23:37:13Z

    [SPARK-5926] [SQL] make DataFrame.explain leverage queryExecution.logical
    
    DataFrame.explain return wrong result when the query is DDL command.
    
    For example, the following two queries should print out the same execution plan, but it not.
    sql("create table tb as select * from src where key > 490").explain(true)
    sql("explain extended create table tb as select * from src where key > 490")
    
    This is because DataFrame.explain leverage logicalPlan which had been forced executed, we should use  the unexecuted plan queryExecution.logical.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #4707 from yanboliang/spark-5926 and squashes the following commits:
    
    fa6db63 [Yanbo Liang] logicalPlan is not lazy
    0e40a1b [Yanbo Liang] make DataFrame.explain leverage queryExecution.logical
    
    (cherry picked from commit 41e2e5acb749c25641f1f8dea5a2e1d8af319486)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit a1b4856e5ade8797c5ff616d1dae183cce9ee32f
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2015-02-26T00:13:17Z

    [SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with save/load, Python GBT
    
    * Add GradientBoostedTrees Python examples to ML guide
      * I ran these in the pyspark shell, and they worked.
    * Add save/load to examples in ML guide
    * Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
    
    CC: mengxr
    
    Author: Joseph K. Bradley <jo...@databricks.com>
    
    Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits:
    
    c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
    bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide.  Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
    6d81c3e [Joseph K. Bradley] completed python GBT examples
    9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
    c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide.  Added GBT examples to ML guide
    
    (cherry picked from commit d20559b157743981b9c09e286f2aaff8cbefab59)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit b32a6531765ab309f5be9541da140f680c90d7ce
Author: CodingCat <zh...@gmail.com>
Date:   2015-02-23T11:29:25Z

    [SPARK-5724] fix the misconfiguration in AkkaUtils
    
    https://issues.apache.org/jira/browse/SPARK-5724
    
    In AkkaUtil, we set several failure detector related the parameters as following
    
    ```
    al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String])
          .withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString(
          s"""
          |akka.daemonic = on
          |akka.loggers = [""akka.event.slf4j.Slf4jLogger""]
          |akka.stdout-loglevel = "ERROR"
          |akka.jvm-exit-on-fatal-error = off
          |akka.remote.require-cookie = "$requireCookie"
          |akka.remote.secure-cookie = "$secureCookie"
          |akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s
          |akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s
          |akka.remote.transport-failure-detector.threshold = $akkaFailureDetector
          |akka.actor.provider = "akka.remote.RemoteActorRefProvider"
          |akka.remote.netty.tcp.transport-class = "akka.remote.transport.netty.NettyTransport"
          |akka.remote.netty.tcp.hostname = "$host"
          |akka.remote.netty.tcp.port = $port
          |akka.remote.netty.tcp.tcp-nodelay = on
          |akka.remote.netty.tcp.connection-timeout = $akkaTimeout s
          |akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B
          |akka.remote.netty.tcp.execution-pool-size = $akkaThreads
          |akka.actor.default-dispatcher.throughput = $akkaBatchSize
          |akka.log-config-on-start = $logAkkaConfig
          |akka.remote.log-remote-lifecycle-events = $lifecycleEvents
          |akka.log-dead-letters = $lifecycleEvents
          |akka.log-dead-letters-during-shutdown = $lifecycleEvents
          """.stripMargin))
    
    ```
    
    Actually, we do not have any parameter naming "akka.remote.transport-failure-detector.threshold"
    see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html
    what we have is "akka.remote.watch-failure-detector.threshold"
    
    Author: CodingCat <zh...@gmail.com>
    
    Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits:
    
    bafe56e [CodingCat] fix the grammar in configuration doc
    338296e [CodingCat] remove failure-detector related info
    8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils
    
    (cherry picked from commit 242d49584c6aa21d928db2552033661950f760a5)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 56fa38ae6c6dfc0b5783935793383c4093f8fa6b
Author: Brennon York <br...@capitalone.com>
Date:   2015-02-26T00:12:56Z

    [SPARK-1182][Docs] Sort the configuration parameters in configuration.md
    
    Sorts all configuration options present on the `configuration.md` page to ease readability.
    
    Author: Brennon York <br...@capitalone.com>
    
    Closes #3863 from brennonyork/SPARK-1182 and squashes the following commits:
    
    5696f21 [Brennon York] fixed merge conflict with port comments
    81a7b10 [Brennon York] capitalized A in Allocation
    e240486 [Brennon York] moved all spark.mesos properties into the running-on-mesos doc
    7de5f75 [Brennon York] moved serialization from application to compression and serialization section
    a16fec0 [Brennon York] moved shuffle settings from network to shuffle
    f8fa286 [Brennon York] sorted encryption category
    1023f15 [Brennon York] moved initialExecutors
    e9d62aa [Brennon York] fixed akka.heartbeat.interval
    25e6f6f [Brennon York] moved spark.executer.user*
    4625ade [Brennon York] added spark.executor.extra* items
    4ee5648 [Brennon York] fixed merge conflicts
    1b49234 [Brennon York] sorting mishap
    2b5758b [Brennon York] sorting mishap
    6fbdf42 [Brennon York] sorting mishap
    55dc6f8 [Brennon York] sorted security
    ec34294 [Brennon York] sorted dynamic allocation
    2a7c4a3 [Brennon York] sorted scheduling
    aa9acdc [Brennon York] sorted networking
    a4380b8 [Brennon York] sorted execution behavior
    27f3919 [Brennon York] sorted compression and serialization
    80a5bbb [Brennon York] sorted spark ui
    3f32e5b [Brennon York] sorted shuffle behavior
    6c51b38 [Brennon York] sorted runtime environment
    efe9d6f [Brennon York] sorted application properties
    
    (cherry picked from commit 46a044a36a2aff1306f7f677e952ce253ddbefac)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit a51d9dbeb67b5c265d233e4551c9d33b046f0b77
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-02-26T07:43:29Z

    [SPARK-5976][MLLIB] Add partitioner to factors returned by ALS
    
    The model trained by ALS requires partitioning information to do quick lookup of a user/item factor for making recommendation on individual requests. In the new implementation, we didn't set partitioners in the factors returned by ALS, which would cause performance regression.
    
    srowen coderxiang
    
    Author: Xiangrui Meng <me...@databricks.com>
    
    Closes #4748 from mengxr/SPARK-5976 and squashes the following commits:
    
    9373a09 [Xiangrui Meng] add partitioner to factors returned by ALS
    260f183 [Xiangrui Meng] add a test for partitioner
    
    (cherry picked from commit e43139f40309995b1133c7ef2936ab858b7b44fc)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit e0f5fb0adda8a26cd6d32b2e838446d16980d8e2
Author: Yin Huai <yh...@databricks.com>
Date:   2015-02-26T14:39:49Z

    [SPARK-6023][SQL] ParquetConversions fails to replace the destination MetastoreRelation of an InsertIntoTable node to ParquetRelation2
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-6023
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #4782 from yhuai/parquetInsertInto and squashes the following commits:
    
    ae7e806 [Yin Huai] Convert MetastoreRelation in InsertIntoTable and InsertIntoHiveTable.
    ba543cd [Yin Huai] More tests.
    50b6d0f [Yin Huai] Update error messages.
    346780c [Yin Huai] Failed test.
    
    (cherry picked from commit f02394d06473889d0d7897c4583239e6e136ff46)
    Signed-off-by: Cheng Lian <li...@databricks.com>

commit b5c5e93d71741f3daa7bf9b3de838c36bd234511
Author: Yin Huai <yh...@databricks.com>
Date:   2015-02-26T17:01:32Z

    [SPARK-6016][SQL] Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true
    
    Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug.
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #4775 from yhuai/parquetFooterCache and squashes the following commits:
    
    78787b1 [Yin Huai] Remove footerCache in FilteringParquetRowInputFormat.
    dff6fba [Yin Huai] Failed unit test.
    
    (cherry picked from commit 192e42a2933eb283e12bfdfb46e2ef895228af4a)
    Signed-off-by: Cheng Lian <li...@databricks.com>

commit 7c779d8d52a445362fc57e740ce51bdc2e93ad7f
Author: Jacky Li <ja...@huawei.com>
Date:   2015-02-26T18:40:58Z

    [SPARK-6007][SQL] Add numRows param in DataFrame.show()
    
    It is useful to let the user decide the number of rows to show in DataFrame.show
    
    Author: Jacky Li <ja...@huawei.com>
    
    Closes #4767 from jackylk/show and squashes the following commits:
    
    a0e0f4b [Jacky Li] fix testcase
    7cdbe91 [Jacky Li] modify according to comment
    bb54537 [Jacky Li] for Java compatibility
    d7acc18 [Jacky Li] modify according to comments
    981be52 [Jacky Li] add numRows param in DataFrame.show()
    
    (cherry picked from commit 2358657547016d647cdd2e2d363426fcd8d3e9ff)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit dafb3d210287b07175079ce5e6d9063fefc559ef
Author: Davies Liu <da...@databricks.com>
Date:   2015-02-26T18:45:29Z

    [SPARK-6015] fix links to source code in Python API docs
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #4772 from davies/source_link and squashes the following commits:
    
    389f0c6 [Davies Liu] fix link to source code in Pyton API docs
    
    (cherry picked from commit 015895ab508efde0702b51c5e537a5a6a191d209)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 5d309ad6c085b6e193771d1ecb9c52f8a55b21ef
Author: Davies Liu <da...@databricks.com>
Date:   2015-02-26T19:54:17Z

    [SPARK-5363] Fix bug in PythonRDD: remove() inside iterator is not safe
    
    Removing elements from a mutable HashSet while iterating over it can cause the
    iteration to incorrectly skip over entries that were not removed. If this
    happened, PythonRDD would write fewer broadcast variables than the Python
    worker was expecting to read, which would cause the Python worker to hang
    indefinitely.
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #4776 from davies/fix_hang and squashes the following commits:
    
    a4384a5 [Davies Liu] fix bug: remvoe() inside iterator is not safe
    
    (cherry picked from commit 7fa960e653a905fc48d4097b49ce560cff919fa2)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit 62652dc5be7de4189a8fdd9d17b819c9589328da
Author: Li Zhihui <zh...@intel.com>
Date:   2015-02-26T21:07:07Z

    Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs.
    
    The configuration is not supported in mesos mode now.
    See https://github.com/apache/spark/pull/1462
    
    Author: Li Zhihui <zh...@intel.com>
    
    Closes #4781 from li-zhihui/fixdocconf and squashes the following commits:
    
    63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs.
    
    (cherry picked from commit 10094a523e3993b775111ae9b22ca31cc0d76e03)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 731a997db6482c96d2aef3a078cce89e078f3173
Author: Tathagata Das <ta...@gmail.com>
Date:   2015-02-26T21:46:07Z

    [SPARK-6027][SPARK-5546] Fixed --jar and --packages not working for KafkaUtils and improved error message
    
    The problem with SPARK-6027 in short is that JARs like the kafka-assembly.jar does not work in python as the added JAR is not visible in the classloader used by Py4J. Py4J uses Class.forName(), which does not uses the systemclassloader, but the JARs are only visible in the Thread's contextclassloader. So this back uses the context class loader to create the KafkaUtils dstream object. This works for both cases where the Kafka libraries are added with --jars spark-streaming-kafka-assembly.jar or with --packages spark-streaming-kafka
    
    Also improves the error message.
    
    davies
    
    Author: Tathagata Das <ta...@gmail.com>
    
    Closes #4779 from tdas/kafka-python-fix and squashes the following commits:
    
    fb16b04 [Tathagata Das] Removed import
    c1fdf35 [Tathagata Das] Fixed long line and improved documentation
    7b88be8 [Tathagata Das] Fixed --jar not working for KafkaUtils and improved error message
    
    (cherry picked from commit aa63f633d39efa8c29095295f161eaad5495071d)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit fe7967483d1b0871c128863f51826c43fd71a12e
Author: Cheolsoo Park <ch...@netflix.com>
Date:   2015-02-26T21:53:49Z

    [SPARK-6018] [YARN] NoSuchMethodError in Spark app is swallowed by YARN AM
    
    Author: Cheolsoo Park <ch...@netflix.com>
    
    Closes #4773 from piaozhexiu/SPARK-6018 and squashes the following commits:
    
    2a919d5 [Cheolsoo Park] Rename e with cause to avoid duplicate names
    1e71d2d [Cheolsoo Park] Replace placeholder with throwable
    eb5750d [Cheolsoo Park] NoSuchMethodError in Spark app is swallowed by YARN AM
    
    (cherry picked from commit 5f3238b3b0157091d28803aa3b1d248dfa6cdc59)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 297c3ef826933b4a71d5a54950cabdb1d7d54613
Author: moussa taifi <mo...@gmail.com>
Date:   2015-02-26T22:19:43Z

    Add a note for context termination for History server on Yarn
    
    The history server on Yarn only shows completed jobs. This adds a note concerning the needed explicit context termination at the end of a spark job which is a best practice anyway.
    Related to SPARK-2972 and SPARK-3458
    
    Author: moussa taifi <mo...@gmail.com>
    
    Closes #4721 from moutai/add-history-server-note-for-closing-the-spark-context and squashes the following commits:
    
    9f5b6c3 [moussa taifi] Fix upper case typo for YARN
    3ad3db4 [moussa taifi] Add context termination for History server on Yarn
    
    (cherry picked from commit c871e2dae0182e914135560d14304242e1f97f7e)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 5b426cb1ff99ebe628e36a8e9b20fec7bc3ff1f3
Author: mohit.goyal <mo...@guavus.com>
Date:   2015-02-26T22:27:47Z

    [SPARK-5951][YARN] Remove unreachable driver memory properties in yarn client mode
    
    Remove unreachable driver memory properties in yarn client mode
    
    Author: mohit.goyal <mo...@guavus.com>
    
    Closes #4730 from zuxqoj/master and squashes the following commits:
    
    977dc96 [mohit.goyal] remove not rechable deprecated variables in yarn client mode
    
    (cherry picked from commit b38dec2ffdf724ff4e181cc8c7427d074b442670)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit b83a93e08e0fecc40fee2e47e78f88a2792555de
Author: Sean Owen <so...@cloudera.com>
Date:   2015-02-27T01:35:09Z

    SPARK-4579 [WEBUI] Scheduling Delay appears negative
    
    Ensure scheduler delay handles unfinished task case, and ensure delay is never negative even due to rounding
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #4796 from srowen/SPARK-4579 and squashes the following commits:
    
    ad6713c [Sean Owen] Ensure scheduler delay handles unfinished task case, and ensure delay is never negative even due to rounding
    
    (cherry picked from commit fbc469473dd529eb72046186b85dd8fc2b7c5bb5)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 25a109e4228c125acf2ac25625e798f9d34947cf
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2015-02-27T03:06:47Z

    [SPARK-6037][SQL] Avoiding duplicate Parquet schema merging
    
    `FilteringParquetRowInputFormat` manually merges Parquet schemas before computing splits. However, it is duplicate because the schemas are already merged in `ParquetRelation2`. We don't need to re-merge them at `InputFormat`.
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    
    Closes #4786 from viirya/dup_parquet_schemas_merge and squashes the following commits:
    
    ef78a5a [Liang-Chi Hsieh] Avoiding duplicate Parquet schema merging.
    
    (cherry picked from commit 4ad5153f5449319a7e82c9013ccff4494ab58ef1)
    Signed-off-by: Cheng Lian <li...@databricks.com>

commit 6200f0709c5c8440decae8bf700d7859f32ac9d5
Author: Yin Huai <yh...@databricks.com>
Date:   2015-02-27T04:46:05Z

    [SPARK-6024][SQL] When a data source table has too many columns, it's schema cannot be stored in metastore.
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-6024
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #4795 from yhuai/wideSchema and squashes the following commits:
    
    4882e6f [Yin Huai] Address comments.
    73e71b4 [Yin Huai] Address comments.
    143927a [Yin Huai] Simplify code.
    cc1d472 [Yin Huai] Make the schema wider.
    12bacae [Yin Huai] If the JSON string of a schema is too large, split it before storing it in metastore.
    e9b4f70 [Yin Huai] Failed test.
    
    (cherry picked from commit 5e5ad6558d60cfbf360708584e883e80d363e33e)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 485b91934f66f4daa54fbe306d5281a96f949895
Author: Lukasz Jastrzebski <lu...@gmail.com>
Date:   2015-02-27T06:38:06Z

    SPARK-2168 [Spark core] Use relative URIs for the app links in the History Server.
    
    As agreed in PR #1160 adding test to verify if history server generates relative links to applications.
    
    Author: Lukasz Jastrzebski <lu...@gmail.com>
    
    Closes #4778 from elyast/master and squashes the following commits:
    
    0c07fab [Lukasz Jastrzebski] Incorporating comments for SPARK-2168
    6d7866d [Lukasz Jastrzebski] Adjusting test for  SPARK-2168 for master branch
    d6f4fbe [Lukasz Jastrzebski] Added test for  SPARK-2168
    
    (cherry picked from commit 4a8a0a8ecd836bf7fe0f2e692cf20a62dda313c0)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit b8db84c5b6a9d7d538d066f5a84a50ac8c8f9b41
Author: 许鹏 <pe...@fraudmetrix.cn>
Date:   2015-02-27T07:05:56Z

    fix spark-6033, clarify the spark.worker.cleanup behavior in standalone mode
    
    jira case spark-6033 https://issues.apache.org/jira/browse/SPARK-6033
    
    In standalone deploy mode, the cleanup will only remove the stopped application's directories.
    
    The original description about the cleanup behavior is incorrect.
    
    Author: 许鹏 <pe...@fraudmetrix.cn>
    
    Closes #4803 from hseagle/spark-6033 and squashes the following commits:
    
    927a6a0 [许鹏] fix the incorrect description about the spark.worker.cleanup in standalone mode
    
    (cherry picked from commit 0375a413b8a009f5820897691570a1273ee25b97)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit bff80889407682fffdd04cf076ec8fcd80870e38
Author: zsxwing <zs...@gmail.com>
Date:   2015-02-27T13:31:46Z

    [SPARK-6058][Yarn] Log the user class exception in ApplicationMaster
    
    Because ApplicationMaster doesn't set SparkUncaughtExceptionHandler, the exception in the user class won't be logged. This PR added a `logError` for it.
    
    Author: zsxwing <zs...@gmail.com>
    
    Closes #4813 from zsxwing/SPARK-6058 and squashes the following commits:
    
    806c932 [zsxwing] Log the user class exception
    
    (cherry picked from commit e747e98490f8ede23b0a9e0795e7445d0b597624)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 117e10c1870ded483fc2f55f0bed3394797b8b4a
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2015-02-27T21:00:36Z

    [SPARK-4587] [mllib] [docs] Fixed save,load calls in ML guide examples
    
    Should pass spark context to save/load
    
    CC: mengxr
    
    Author: Joseph K. Bradley <jo...@databricks.com>
    
    Closes #4816 from jkbradley/ml-io-doc-fix and squashes the following commits:
    
    83d369d [Joseph K. Bradley] added comment to save,load parts of ML guide examples
    2841170 [Joseph K. Bradley] Fixed save,load calls in ML guide examples
    
    (cherry picked from commit d17cb2ba33b363dd346ac5a5681e1757decd0f4d)
    Signed-off-by: Xiangrui Meng <me...@databricks.com>

commit ceebe3c60cebe99738219bf5a8fcbb34cd0f9188
Author: Saisai Shao <sa...@intel.com>
Date:   2015-02-27T21:01:42Z

    [Streaming][Minor] Remove useless type signature of Java Kafka direct stream API
    
    cc tdas .
    
    Author: Saisai Shao <sa...@intel.com>
    
    Closes #4817 from jerryshao/signature-minor-fix and squashes the following commits:
    
    eebfaac [Saisai Shao] Remove useless type parameter
    
    (cherry picked from commit 5f7f3b938e1776168be866fc9ee87dc7494696cc)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit 5d19cf083e1637bc63c385778d63db0ea240cb79
Author: Cheng Lian <li...@databricks.com>
Date:   2015-02-28T00:41:49Z

    [SPARK-5751] [SQL] Sets SPARK_HOME as SPARK_PID_DIR when running Thrift server test suites
    
    This is a follow-up of #4720. By default, `spark-daemon.sh` writes PID files under `/tmp`, which makes it impossible to start multiple server instances simultaneously. This PR sets `SPARK_PID_DIR` to Spark home directory to workaround this problem.
    
    Many thanks to chenghao-intel for pointing out this issue!
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4758)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <li...@databricks.com>
    
    Closes #4758 from liancheng/thriftserver-pid-dir and squashes the following commits:
    
    252fa0f [Cheng Lian] Uses temporary directory as Thrift server PID directory
    1b3d1e3 [Cheng Lian] Sets SPARK_HOME as SPARK_PID_DIR when running Thrift server test suites
    
    (cherry picked from commit 8c468a6600e0deb5464990df60148212e64fdecd)
    Signed-off-by: Cheng Lian <li...@databricks.com>

commit 49f2187a394e94376548bf80cb874f28ba73a435
Author: Davies Liu <da...@databricks.com>
Date:   2015-02-28T04:07:17Z

    [SPARK-6055] [PySpark] fix incorrect __eq__ of DataType
    
    The _eq_ of DataType is not correct, class cache is not use correctly (created class can not be find by dataType), then it will create lots of classes (saved in _cached_cls), never released.
    
    Also, all same DataType have same hash code, there will be many object in a dict with the same hash code, end with hash attach, it's very slow to access this dict (depends on the implementation of CPython).
    
    This PR also improve the performance of inferSchema (avoid the unnecessary converter of object).
    
    cc pwendell  JoshRosen
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #4808 from davies/leak and squashes the following commits:
    
    6a322a4 [Davies Liu] tests refactor
    3da44fc [Davies Liu] fix __eq__ of Singleton
    534ac90 [Davies Liu] add more checks
    46999dc [Davies Liu] fix tests
    d9ae973 [Davies Liu] fix memory leak in sql
    
    (cherry picked from commit e0e64ba4b1b8eb72e856286f756c65fa22ab0a36)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit 1747e0a68f6e0222e0b0725a9ec7657e0b2beebf
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-02-28T06:44:11Z

    [SPARK-6070] [yarn] Remove unneeded classes from shuffle service jar.
    
    These may conflict with the classes already in the NM. We shouldn't
    be repackaging them.
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #4820 from vanzin/SPARK-6070 and squashes the following commits:
    
    871b566 [Marcelo Vanzin] The "d'oh how didn't I think of it before" solution.
    3cba946 [Marcelo Vanzin] Use profile instead, so that dependencies don't need to be explicitly listed.
    7a18a1b [Marcelo Vanzin] [SPARK-6070] [yarn] Remove unneeded classes from shuffle service jar.
    
    (cherry picked from commit dba08d1fc3bdb9245aefe695970354df088a93b6)
    Signed-off-by: Patrick Wendell <pa...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Lr graphx sgd

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5420#issuecomment-90887400
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Lr graphx sgd

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5420#issuecomment-90887293
  
    Looks like this was opened by mistake. Mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org