You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by hyl713 <gi...@git.apache.org> on 2016/01/06 08:29:34 UTC

[GitHub] spark pull request: Branch 1.6

GitHub user hyl713 opened a pull request:

    https://github.com/apache/spark/pull/10616

    Branch 1.6

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10616.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10616
    
----
commit 7b720bf1c4860fdb560540744c5fdfb333b752bc
Author: wangt <wa...@gmail.com>
Date:   2015-11-25T19:41:05Z

    [SPARK-11880][WINDOWS][SPARK SUBMIT] bin/load-spark-env.cmd loads spark-env.cmd from wrong directory
    
    * On windows the `bin/load-spark-env.cmd` tries to load `spark-env.cmd` from `%~dp0..\..\conf`, where `~dp0` points to `bin` and `conf` is only one level up.
    * Updated `bin/load-spark-env.cmd` to load `spark-env.cmd` from `%~dp0..\conf`, instead of `%~dp0..\..\conf`
    
    Author: wangt <wa...@gmail.com>
    
    Closes #9863 from toddwan/master.
    
    (cherry picked from commit 9f3e59a16822fb61d60cf103bd4f7823552939c6)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit b4cf318ab9e75f9fb8e4e15f60d263889189b445
Author: jerryshao <ss...@hortonworks.com>
Date:   2015-11-25T19:42:53Z

    [SPARK-10558][CORE] Fix wrong executor state in Master
    
    `ExecutorAdded` can only be sent to `AppClient` when worker report back the executor state as `LOADING`, otherwise because of concurrency issue, `AppClient` will possibly receive `ExectuorAdded` at first, then `ExecutorStateUpdated` with `LOADING` state.
    
    Also Master will change the executor state from `LAUNCHING` to `RUNNING` (`AppClient` report back the state as `RUNNING`), then to `LOADING` (worker report back to state as `LOADING`), it should be `LAUNCHING` -> `LOADING` -> `RUNNING`.
    
    Also it is wrongly shown in master UI, the state of executor should be `RUNNING` rather than `LOADING`:
    
    ![screen shot 2015-09-11 at 2 30 28 pm](https://cloud.githubusercontent.com/assets/850797/9809254/3155d840-5899-11e5-8cdf-ad06fef75762.png)
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #8714 from jerryshao/SPARK-10558.
    
    (cherry picked from commit 88875d9413ec7d64a88d40857ffcf97b5853a7f2)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 849ddb6ae69416434173824d50d59ebc8b4dbbf5
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-11-25T19:47:21Z

    [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java
    
    The Python exception track in TransformFunction and TransformFunctionSerializer is not sent back to Java. Py4j just throws a very general exception, which is hard to debug.
    
    This PRs adds `getFailure` method to get the failure message in Java side.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #9922 from zsxwing/SPARK-11935.
    
    (cherry picked from commit d29e2ef4cf43c7f7c5aa40d305cf02be44ce19e0)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit cd86d8c745b745b120ec00cd466ac6cdb05296a6
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-11-25T20:58:18Z

    [SPARK-11866][NETWORK][CORE] Make sure timed out RPCs are cleaned up.
    
    This change does a couple of different things to make sure that the RpcEnv-level
    code and the network library agree about the status of outstanding RPCs.
    
    For RPCs that do not expect a reply ("RpcEnv.send"), support for one way
    messages (hello CORBA!) was added to the network layer. This is a
    "fire and forget" message that does not require any state to be kept
    by the TransportClient; as a result, the RpcEnv 'Ack' message is not needed
    anymore.
    
    For RPCs that do expect a reply ("RpcEnv.ask"), the network library now
    returns the internal RPC id; if the RpcEnv layer decides to time out the
    RPC before the network layer does, it now asks the TransportClient to
    forget about the RPC, so that if the network-level timeout occurs, the
    client is not killed.
    
    As part of implementing the above, I cleaned up some of the code in the
    netty rpc backend, removing types that were not necessary and factoring
    out some common code. Of interest is a slight change in the exceptions
    when posting messages to a stopped RpcEnv; that's mostly to avoid nasty
    error messages from the local-cluster backend when shutting down, which
    pollutes the terminal output.
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #9917 from vanzin/SPARK-11866.
    
    (cherry picked from commit 4e81783e92f464d479baaf93eccc3adb1496989a)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit 3997397024975b8affca0d609a231cde5e9959a5
Author: Reynold Xin <rx...@databricks.com>
Date:   2015-11-25T21:45:41Z

    Fix Aggregator documentation (rename present to finish).
    
    (cherry picked from commit ecac2835458bbf73fe59413d5bf921500c5b987d)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit d40bf9ad881f4ba6c550cd61acc2e8c29c9dc60f
Author: Davies Liu <da...@databricks.com>
Date:   2015-11-26T05:25:20Z

    [SPARK-12003] [SQL] remove the prefix for name after expanded star
    
    Right now, the expended start will include the name of expression as prefix for column, that's not better than without expending, we should not have the prefix.
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #9984 from davies/expand_star.
    
    (cherry picked from commit d1930ec01ab5a9d83f801f8ae8d4f15a38d98b76)
    Signed-off-by: Davies Liu <da...@gmail.com>

commit 7e7f2627f941585a6fb1e086e22d6d1d25b692ab
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-26T07:24:33Z

    [SPARK-11980][SPARK-10621][SQL] Fix json_tuple and add test cases for
    
    Added Python test cases for the function `isnan`, `isnull`, `nanvl` and `json_tuple`.
    
    Fixed a bug in the function `json_tuple`
    
    rxin , could you help me review my changes? Please let me know anything is missing.
    
    Thank you! Have a good Thanksgiving day!
    
    Author: gatorsmile <ga...@gmail.com>
    
    Closes #9977 from gatorsmile/json_tuple.
    
    (cherry picked from commit 068b6438d6886ce5b4aa698383866f466d913d66)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 0df6beccc84166a00c7c98929bf487d9cea68e1d
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-11-26T07:31:21Z

    [SPARK-11999][CORE] Fix the issue that ThreadUtils.newDaemonCachedThreadPool doesn't cache any task
    
    In the previous codes, `newDaemonCachedThreadPool` uses `SynchronousQueue`, which is wrong. `SynchronousQueue` is an empty queue that cannot cache any task. This patch uses `LinkedBlockingQueue` to fix it along with other fixes to make sure `newDaemonCachedThreadPool` can use at most `maxThreadNumber` threads, and after that, cache tasks to `LinkedBlockingQueue`.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #9978 from zsxwing/cached-threadpool.
    
    (cherry picked from commit d3ef693325f91a1ed340c9756c81244a80398eb2)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit 24e62b318a3241bd750978a07fd505d605d2af76
Author: Davies Liu <da...@databricks.com>
Date:   2015-11-26T08:19:42Z

    [SPARK-11973] [SQL] push filter through aggregation with alias and literals
    
    Currently, filter can't be pushed through aggregation with alias or literals, this patch fix that.
    
    After this patch, the time of TPC-DS query 4 go down to 13 seconds from 141 seconds (10x improvements).
    
    cc nongli  yhuai
    
    Author: Davies Liu <da...@databricks.com>
    
    Closes #9959 from davies/push_filter2.
    
    (cherry picked from commit 27d69a0573ed55e916a464e268dcfd5ecc6ed849)
    Signed-off-by: Davies Liu <da...@gmail.com>

commit 5e89f16453def1cc07cbcbffd0ab4bf429198cb0
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-11-26T09:15:05Z

    [SPARK-12005][SQL] Work around VerifyError in HyperLogLogPlusPlus.
    
    Just move the code around a bit; that seems to make the JVM happy.
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #9985 from vanzin/SPARK-12005.
    
    (cherry picked from commit 001f0528a851ac314b390e65eb0583f89e69a949)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit ce90bbe427623c52d0c5380a979197be84218e45
Author: Dilip Biswal <db...@us.ibm.com>
Date:   2015-11-26T19:31:28Z

    [SPARK-11863][SQL] Unable to resolve order by if it contains mixture of aliases and real columns
    
    this is based on https://github.com/apache/spark/pull/9844, with some bug fix and clean up.
    
    The problems is that, normal operator should be resolved based on its child, but `Sort` operator can also be resolved based on its grandchild. So we have 3 rules that can resolve `Sort`: `ResolveReferences`, `ResolveSortReferences`(if grandchild is `Project`) and `ResolveAggregateFunctions`(if grandchild is `Aggregate`).
    For example, `select c1 as a , c2 as b from tab group by c1, c2 order by a, c2`, we need to resolve `a` and `c2` for `Sort`. Firstly `a` will be resolved in `ResolveReferences` based on its child, and when we reach `ResolveAggregateFunctions`, we will try to resolve both `a` and `c2` based on its grandchild, but failed because `a` is not a legal aggregate expression.
    
    whoever merge this PR, please give the credit to dilipbiswal
    
    Author: Dilip Biswal <db...@us.ibm.com>
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #9961 from cloud-fan/sort.
    
    (cherry picked from commit bc16a67562560c732833260cbc34825f7e9dcb8f)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit 64813b7e7ed5d0367d5e8c58831d64dd4b1007d2
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-27T00:20:08Z

    [SPARK-11998][SQL][TEST-HADOOP2.0] When downloading Hadoop artifacts from maven, we need to try to download the version that is used by Spark
    
    If we need to download Hive/Hadoop artifacts, try to download a Hadoop that matches the Hadoop used by Spark. If the Hadoop artifact cannot be resolved (e.g. Hadoop version is a vendor specific version like 2.0.0-cdh4.1.1), we will use Hadoop 2.4.0 (we used to hard code this version as the hadoop that we will download from maven) and we will not share Hadoop classes.
    
    I tested this match in my laptop with the following confs (these confs are used by our builds). All tests are good.
    ```
    build/sbt -Phadoop-1 -Dhadoop.version=1.2.1 -Pkinesis-asl -Phive-thriftserver -Phive
    build/sbt -Phadoop-1 -Dhadoop.version=2.0.0-mr1-cdh4.1.1 -Pkinesis-asl -Phive-thriftserver -Phive
    build/sbt -Pyarn -Phadoop-2.2 -Pkinesis-asl -Phive-thriftserver -Phive
    build/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive-thriftserver -Phive
    ```
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #9979 from yhuai/versionsSuite.
    
    (cherry picked from commit ad76562390b81207f8f32491c0bd8ad0e020141f)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit f18de5a83e6f6e96215ec02a6765972f03e85c6b
Author: Reynold Xin <rx...@databricks.com>
Date:   2015-11-27T02:47:54Z

    [SPARK-11973][SQL] Improve optimizer code readability.
    
    This is a followup for https://github.com/apache/spark/pull/9959.
    
    I added more documentation and rewrote some monadic code into simpler ifs.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #9995 from rxin/SPARK-11973.
    
    (cherry picked from commit de28e4d4deca385b7c40b3a6a1efcd6e2fec2f9b)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit fc35fb35143f887e010e7fc9f0f4903a6722ce6b
Author: muxator <mu...@users.noreply.github.com>
Date:   2015-11-27T02:52:20Z

    doc typo: "classificaion" -> "classification"
    
    Author: muxator <mu...@users.noreply.github.com>
    
    Closes #10008 from muxator/patch-1.
    
    (cherry picked from commit 4376b5bea8171e4e73b3dbabbfdf84fa1afd140b)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit c1bde2a92ebb4768dc6ede8ebc8dfaacc571bf44
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-11-27T02:56:22Z

    [SPARK-11996][CORE] Make the executor thread dump work again
    
    In the previous implementation, the driver needs to know the executor listening address to send the thread dump request. However, in Netty RPC, the executor doesn't listen to any port, so the executor thread dump feature is broken.
    
    This patch makes the driver use the endpointRef stored in BlockManagerMasterEndpoint to send the thread dump request to fix it.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #9976 from zsxwing/executor-thread-dump.
    
    (cherry picked from commit 0c1e72e7f79231e537299b57a1ab7cd843171923)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 29f3a2fc83fb8f81accac3122d1064ed580ab3e8
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-11-27T03:00:36Z

    [SPARK-12011][SQL] Stddev/Variance etc should support columnName as arguments
    
    Spark SQL aggregate function:
    ```Java
    stddev
    stddev_pop
    stddev_samp
    variance
    var_pop
    var_samp
    skewness
    kurtosis
    collect_list
    collect_set
    ```
    should support ```columnName``` as arguments like other aggregate function(max/min/count/sum).
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #9994 from yanboliang/SPARK-12011.
    
    (cherry picked from commit 6f6bb0e893c8370cbab4d63a56d74e00cb7f3cf6)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 6c31c20ea941ef9efac87fd55b178b76baab9aa4
Author: mariusvniekerk <ma...@gmail.com>
Date:   2015-11-27T03:13:16Z

    [SPARK-11881][SQL] Fix for postgresql fetchsize > 0
    
    Reference: https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
    In order for PostgreSQL to honor the fetchSize non-zero setting, its Connection.autoCommit needs to be set to false. Otherwise, it will just quietly ignore the fetchSize setting.
    
    This adds a new side-effecting dialect specific beforeFetch method that will fire before a select query is ran.
    
    Author: mariusvniekerk <ma...@gmail.com>
    
    Closes #9861 from mariusvniekerk/SPARK-11881.
    
    (cherry picked from commit b63938a8b04a30feb6b2255c4d4e530a74855afc)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 5c3a9dfa64ecec4775e1508c45cf46bf75a548bd
Author: Jeff Zhang <zj...@apache.org>
Date:   2015-11-27T03:15:22Z

    [SPARK-11917][PYSPARK] Add SQLContext#dropTempTable to PySpark
    
    Author: Jeff Zhang <zj...@apache.org>
    
    Closes #9903 from zjffdu/SPARK-11917.
    
    (cherry picked from commit d8220885c492141dfc61e8ffb92934f2339fe8d3)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit d2a5a4930fe802e6f51089064fd2de52f263e45e
Author: Huaxin Gao <hu...@oc0558782468.ibm.com>
Date:   2015-11-27T03:17:46Z

    [SPARK-11778][SQL] add regression test
    
    Fix regression test for SPARK-11778.
     marmbrus
    Could you please take a look?
    Thank you very much!!
    
    Author: Huaxin Gao <hu...@oc0558782468.ibm.com>
    
    Closes #9890 from huaxingao/spark-11778-regression-test.
    
    (cherry picked from commit 4d4cbc034bef559f47f8b74cecd8196dc8a85348)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit bb3fe0a646055fcc3a4fa14ff0df14dec5508393
Author: Dilip Biswal <db...@us.ibm.com>
Date:   2015-11-27T05:04:40Z

    [SPARK-11997] [SQL] NPE when save a DataFrame as parquet and partitioned by long column
    
    Check for partition column null-ability while building the partition spec.
    
    Author: Dilip Biswal <db...@us.ibm.com>
    
    Closes #10001 from dilipbiswal/spark-11997.
    
    (cherry picked from commit a374e20b5492c775f20d32e8fbddadbd8098a655)
    Signed-off-by: Davies Liu <da...@gmail.com>

commit dfc98fac9fa6d93bbb59a7f5b06aac73d15c1707
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-11-27T19:48:01Z

    [SPARK-12025][SPARKR] Rename some window rank function names for SparkR
    
    Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side.
    There are two reasons that we should make this change:
    * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645)
    * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0.
    
    It's better to fix this issue before 1.6 release, otherwise we will make breaking API change.
    cc shivaram sun-rui
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #10016 from yanboliang/SPARK-12025.
    
    (cherry picked from commit ba02f6cb5a40511cefa511d410be93c035d43f23)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 9966357932a50aa22f94f39201559beb8c0c6efb
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-11-27T19:50:18Z

    [SPARK-12021][STREAMING][TESTS] Fix the potential dead-lock in StreamingListenerSuite
    
    In StreamingListenerSuite."don't call ssc.stop in listener", after the main thread calls `ssc.stop()`,  `StreamingContextStoppingCollector` may call  `ssc.stop()` in the listener bus thread, which is a dead-lock. This PR updated `StreamingContextStoppingCollector` to only call `ssc.stop()` in the first batch to avoid the dead-lock.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #10011 from zsxwing/fix-test-deadlock.
    
    (cherry picked from commit f57e6c9effdb9e282fc8ae66dc30fe053fed5272)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit 47c2c8ce3d631d83b25c8f0a05ccc6ba7bacf66d
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-27T23:11:13Z

    [SPARK-12020][TESTS][TEST-HADOOP2.0] PR builder cannot trigger hadoop 2.0 test
    
    https://issues.apache.org/jira/browse/SPARK-12020
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #10010 from yhuai/SPARK-12020.
    
    (cherry picked from commit b9921524d970f9413039967c1f17ae2e736982f0)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 2ecc0f2434336ab23c76d19d7543efc1c2b6e412
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-28T06:44:08Z

    [SPARK-12028] [SQL] get_json_object returns an incorrect result when the value is null literals
    
    When calling `get_json_object` for the following two cases, both results are `"null"`:
    
    ```scala
        val tuple: Seq[(String, String)] = ("5", """{"f1": null}""") :: Nil
        val df: DataFrame = tuple.toDF("key", "jstring")
        val res = df.select(functions.get_json_object($"jstring", "$.f1")).collect()
    ```
    ```scala
        val tuple2: Seq[(String, String)] = ("5", """{"f1": "null"}""") :: Nil
        val df2: DataFrame = tuple2.toDF("key", "jstring")
        val res3 = df2.select(functions.get_json_object($"jstring", "$.f1")).collect()
    ```
    
    Fixed the problem and also added a test case.
    
    Author: gatorsmile <ga...@gmail.com>
    
    Closes #10018 from gatorsmile/get_json_object.
    
    (cherry picked from commit 149cd692ee2e127d79386fd8e584f4f70a2906ba)
    Signed-off-by: Davies Liu <da...@gmail.com>

commit 2503a43505ef3ba5ad3aa484fea9b807dea198c0
Author: felixcheung <fe...@hotmail.com>
Date:   2015-11-29T05:02:05Z

    [SPARK-12029][SPARKR] Improve column functions signature, param check, tests, fix doc and add examples
    
    shivaram sun-rui
    
    Author: felixcheung <fe...@hotmail.com>
    
    Closes #10019 from felixcheung/rfunctionsdoc.
    
    (cherry picked from commit 28e46ab46368ea3833c8e805163893bbb6f2a265)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 06da5fdde8c78653df93c303a78aa9290f5bb660
Author: felixcheung <fe...@hotmail.com>
Date:   2015-11-29T05:16:21Z

    [SPARK-9319][SPARKR] Add support for setting column names, types
    
    Add support for for colnames, colnames<-, coltypes<-
    Also added tests for names, names<- which have no test previously.
    
    I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was #9218
    
    shivaram sun-rui
    
    Author: felixcheung <fe...@hotmail.com>
    
    Closes #9654 from felixcheung/colnamescoltypes.
    
    (cherry picked from commit c793d2d9a1ccc203fc103eb0636958fe8d71f471)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 5601d8fd0a89fb40369350db655f83d36f9d5b44
Author: Sun Rui <ru...@intel.com>
Date:   2015-11-29T19:08:26Z

    [SPARK-11781][SPARKR] SparkR has problem in inferring type of raw type.
    
    Author: Sun Rui <ru...@intel.com>
    
    Closes #9769 from sun-rui/SPARK-11781.
    
    (cherry picked from commit cc7a1bc9370b163f51230e5ca4be612d133a5086)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit abd31515dc15f6f28fc8d7f9d538a351d65a74b5
Author: Herman van Hovell <hv...@questtec.nl>
Date:   2015-11-29T22:13:11Z

    [SPARK-12024][SQL] More efficient multi-column counting.
    
    In https://github.com/apache/spark/pull/9409 we enabled multi-column counting. The approach taken in that PR introduces a bit of overhead by first creating a row only to check if all of the columns are non-null.
    
    This PR fixes that technical debt. Count now takes multiple columns as its input. In order to make this work I have also added support for multiple columns in the single distinct code path.
    
    cc yhuai
    
    Author: Herman van Hovell <hv...@questtec.nl>
    
    Closes #10015 from hvanhovell/SPARK-12024.
    
    (cherry picked from commit 3d28081e53698ed77e93c04299957c02bcaba9bf)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit a4a2a7deb2a6136550af03bb1129854b289cf417
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-30T03:02:15Z

    [SPARK-12039] [SQL] Ignore HiveSparkSubmitSuite's "SPARK-9757 Persist Parquet relation with decimal column".
    
    https://issues.apache.org/jira/browse/SPARK-12039
    
    Since it is pretty flaky in hadoop 1 tests, we can disable it while we are investigating the cause.
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #10035 from yhuai/SPARK-12039-ignore.
    
    (cherry picked from commit 0ddfe7868948e302858a2b03b50762eaefbeb53e)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 12d97b0c5213e04453f81156f04ed95d877f199c
Author: toddwan <ta...@outlook.com>
Date:   2015-11-30T09:26:29Z

    [SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form zk://host:port for a multi-master Mesos cluster using ZooKeeper
    
    * According to below doc and validation logic in [SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231), master URL for a mesos cluster should always start with `mesos://`
    
    http://spark.apache.org/docs/latest/running-on-mesos.html
    `The Master URLs for Mesos are in the form mesos://host:5050 for a single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos cluster using ZooKeeper.`
    
    * However, [SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749) fails the validation and can receive master URL in the form `zk://host:port`
    
    * For the master URLs in the form `zk:host:port`, the valid form should be `mesos://zk://host:port`
    
    * This PR restrict the validation in `SparkContext.scala`, and now only mesos master URLs prefixed with `mesos://` can be accepted.
    
    * This PR also updated corresponding unit test.
    
    Author: toddwan <ta...@outlook.com>
    
    Closes #9886 from toddwan/S11859.
    
    (cherry picked from commit e0749442051d6e29dae4f4cdcb2937c0b015f98f)
    Signed-off-by: Sean Owen <so...@cloudera.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.6

Posted by hyl713 <gi...@git.apache.org>.

Github user hyl713 closed the pull request at:

    https://github.com/apache/spark/pull/10616


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.6

Posted by hyl713 <gi...@git.apache.org>.

Github user hyl713 commented on the pull request:

    https://github.com/apache/spark/pull/10616#issuecomment-169258359
  
    i am so sorry,it's a mistake...
    
    发送自 Outlook Mobile<https://aka.ms/qtex0l>
    
    
    
    
    On Tue, Jan 5, 2016 at 11:33 PM -0800, "UCB AMPLab" <no...@github.com>> wrote:
    
    
    Can one of the admins verify this patch?
    
    ―
    Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/10616#issuecomment-169258106>.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Branch 1.6

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10616#issuecomment-169258106
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org