You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by speful <gi...@git.apache.org> on 2018/08/16 05:46:50 UTC

[GitHub] spark pull request #22118: Branch 2.2

GitHub user speful opened a pull request:

    https://github.com/apache/spark/pull/22118

    Branch 2.2

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22118
    
----
commit 86609a95af4b700e83638b7416c7e3706c2d64c6
Author: Liang-Chi Hsieh <vi...@...>
Date:   2017-08-08T08:12:41Z

    [SPARK-21567][SQL] Dataset should work with type alias
    
    If we create a type alias for a type workable with Dataset, the type alias doesn't work with Dataset.
    
    A reproducible case looks like:
    
        object C {
          type TwoInt = (Int, Int)
          def tupleTypeAlias: TwoInt = (1, 1)
        }
    
        Seq(1).toDS().map(_ => ("", C.tupleTypeAlias))
    
    It throws an exception like:
    
        type T1 is not a class
        scala.ScalaReflectionException: type T1 is not a class
          at scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275)
          ...
    
    This patch accesses the dealias of type in many places in `ScalaReflection` to fix it.
    
    Added test case.
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    
    Closes #18813 from viirya/SPARK-21567.
    
    (cherry picked from commit ee1304199bcd9c1d5fc94f5b06fdd5f6fe7336a1)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit e87ffcaa3e5b75f8d313dc995e4801063b60cd5c
Author: Wenchen Fan <we...@...>
Date:   2017-08-08T08:32:49Z

    Revert "[SPARK-21567][SQL] Dataset should work with type alias"
    
    This reverts commit 86609a95af4b700e83638b7416c7e3706c2d64c6.

commit d0233145208eb6afcd9fe0c1c3a9dbbd35d7727e
Author: pgandhi <pg...@...>
Date:   2017-08-09T05:46:06Z

    [SPARK-21503][UI] Spark UI shows incorrect task status for a killed Executor Process
    
    The executor tab on Spark UI page shows task as completed when an executor process that is running that task is killed using the kill command.
    Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. This case will consider all those cases where executor connection to Spark Driver was lost due to killing the executor process, network connection etc.
    
    ## How was this patch tested?
    Manually Tested the fix by observing the UI change before and after.
    Before:
    <img width="1398" alt="screen shot-before" src="https://user-images.githubusercontent.com/22228190/28482929-571c9cea-6e30-11e7-93dd-728de5cdea95.png">
    After:
    <img width="1385" alt="screen shot-after" src="https://user-images.githubusercontent.com/22228190/28482964-8649f5ee-6e30-11e7-91bd-2eb2089c61cc.png">
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: pgandhi <pg...@yahoo-inc.com>
    Author: pgandhi999 <pa...@gmail.com>
    
    Closes #18707 from pgandhi999/master.
    
    (cherry picked from commit f016f5c8f6c6aae674e9905a5c0b0bede09163a4)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 7446be3328ea75a5197b2587e3a8e2ca7977726b
Author: WeichenXu <we...@...>
Date:   2017-08-09T06:44:10Z

    [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search
    
    ## What changes were proposed in this pull request?
    
    Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
    https://github.com/scalanlp/breeze/pull/651
    
    ## How was this patch tested?
    
    N/A
    
    Author: WeichenXu <We...@outlook.com>
    
    Closes #18797 from WeichenXu123/update-breeze.
    
    (cherry picked from commit b35660dd0e930f4b484a079d9e2516b0a7dacf1d)
    Signed-off-by: Yanbo Liang <yb...@gmail.com>

commit f6d56d2f1c377000921effea2b1faae15f9cae82
Author: Shixiong Zhu <sh...@...>
Date:   2017-08-09T06:49:33Z

    [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value
    
    Same PR as #18799 but for branch 2.2. Main discussion the other PR.
    --------
    
    When I was investigating a flaky test, I realized that many places don't check the return value of `HDFSMetadataLog.get(batchId: Long): Option[T]`. When a batch is supposed to be there, the caller just ignores None rather than throwing an error. If some bug causes a query doesn't generate a batch metadata file, this behavior will hide it and allow the query continuing to run and finally delete metadata logs and make it hard to debug.
    
    This PR ensures that places calling HDFSMetadataLog.get always check the return value.
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #18890 from tdas/SPARK-21596-2.2.

commit 3ca55eaafee8f4216eb5466021a97604713033a1
Author: 10087686 <wa...@...>
Date:   2017-08-09T10:45:38Z

    [SPARK-21663][TESTS] test("remote fetch below max RPC message size") should call masterTracker.stop() in MapOutputTrackerSuite
    
    Signed-off-by: 10087686 <wang.jiaochunzte.com.cn>
    
    ## What changes were proposed in this pull request?
    After Unit tests end,there should be call masterTracker.stop() to free resource;
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    Run Unit tests;
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: 10087686 <wa...@zte.com.cn>
    
    Closes #18867 from wangjiaochun/mapout.
    
    (cherry picked from commit 6426adffaf152651c30d481bb925d5025fd6130a)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit c909496983314b48dd4d8587e586b553b04ff0ce
Author: Reynold Xin <rx...@...>
Date:   2017-08-11T01:56:25Z

    [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog
    
    ## What changes were proposed in this pull request?
    This patch removes the unused SessionCatalog.getTableMetadataOption and ExternalCatalog. getTableOption.
    
    ## How was this patch tested?
    Removed the test case.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #18912 from rxin/remove-getTableOption.
    
    (cherry picked from commit 584c7f14370cdfafdc6cd554b2760b7ce7709368)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 406eb1c2ee670c2f14f2737c32c9aa0b8d35bf7c
Author: Tejas Patil <te...@...>
Date:   2017-08-11T20:01:00Z

    [SPARK-21595] Separate thresholds for buffering and spilling in ExternalAppendOnlyUnsafeRowArray
    
    ## What changes were proposed in this pull request?
    
    [SPARK-21595](https://issues.apache.org/jira/browse/SPARK-21595) reported that there is excessive spilling to disk due to default spill threshold for `ExternalAppendOnlyUnsafeRowArray` being quite small for WINDOW operator. Old behaviour of WINDOW operator (pre https://github.com/apache/spark/pull/16909) would hold data in an array for first 4096 records post which it would switch to `UnsafeExternalSorter` and start spilling to disk after reaching `spark.shuffle.spill.numElementsForceSpillThreshold` (or earlier if there was paucity of memory due to excessive consumers).
    
    Currently the (switch from in-memory to `UnsafeExternalSorter`) and (`UnsafeExternalSorter` spilling to disk) for `ExternalAppendOnlyUnsafeRowArray` is controlled by a single threshold. This PR aims to separate that to have more granular control.
    
    ## How was this patch tested?
    
    Added unit tests
    
    Author: Tejas Patil <te...@fb.com>
    
    Closes #18843 from tejasapatil/SPARK-21595.
    
    (cherry picked from commit 94439997d57875838a8283c543f9b44705d3a503)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit 7b9807754fd43756ba852bf93590a5024f2aa129
Author: Andrew Ash <an...@...>
Date:   2017-08-14T14:48:08Z

    [SPARK-21563][CORE] Fix race condition when serializing TaskDescriptions and adding jars
    
    ## What changes were proposed in this pull request?
    
    Fix the race condition when serializing TaskDescriptions and adding jars by keeping the set of jars and files for a TaskSet constant across the lifetime of the TaskSet.  Otherwise TaskDescription serialization can produce an invalid serialization when new file/jars are added concurrently as the TaskDescription is serialized.
    
    ## How was this patch tested?
    
    Additional unit test ensures jars/files contained in the TaskDescription remain constant throughout the lifetime of the TaskSet.
    
    Author: Andrew Ash <an...@andrewash.com>
    
    Closes #18913 from ash211/SPARK-21563.
    
    (cherry picked from commit 6847e93cf427aa971dac1ea261c1443eebf4089e)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 48bacd36c673bcbe20dc2e119cddb2a61261a394
Author: Shixiong Zhu <sh...@...>
Date:   2017-08-14T22:06:55Z

    [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files
    
    ## What changes were proposed in this pull request?
    
    Directly writing a snapshot file may generate a partial file. This PR changes it to write to a temp file then rename to the target file.
    
    ## How was this patch tested?
    
    Jenkins.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #18928 from zsxwing/SPARK-21696.
    
    (cherry picked from commit 282f00b410fdc4dc69b9d1f3cb3e2ba53cd85b8b)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit d9c8e6223f6b31bfbca33b1064ead9720cfefa10
Author: Liang-Chi Hsieh <vi...@...>
Date:   2017-08-15T05:29:15Z

    [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are successfully removed
    
    ## What changes were proposed in this pull request?
    
    We put staging path to delete into the deleteOnExit cache of `FileSystem` in case of the path can't be successfully removed. But when we successfully remove the path, we don't remove it from the cache. We should do it to avoid continuing grow the cache size.
    
    ## How was this patch tested?
    
    Added a test.
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    
    Closes #18934 from viirya/SPARK-21721.
    
    (cherry picked from commit 4c3cf1cc5cdb400ceef447d366e9f395cd87b273)
    Signed-off-by: gatorsmile <ga...@gmail.com>

commit f1accc8511cf034fa4edee0c0a5747def0df04a2
Author: Jan Vrsovsky <ja...@...>
Date:   2017-08-16T07:21:42Z

    [SPARK-21723][ML] Fix writing LibSVM (key not found: numFeatures)
    
    Check the option "numFeatures" only when reading LibSVM, not when writing. When writing, Spark was raising an exception. After the change it will ignore the option completely. liancheng HyukjinKwon
    
    (Maybe the usage should be forbidden when writing, in a major version change?).
    
    Manual test, that loading and writing LibSVM files work fine, both with and without the numFeatures option.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: Jan Vrsovsky <ja...@firma.seznam.cz>
    
    Closes #18872 from ProtD/master.
    
    (cherry picked from commit 8321c141f63a911a97ec183aefa5ff75a338c051)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit f5ede0d558e3db51867d8c1c0a12c8fb286c797c
Author: John Lee <jl...@...>
Date:   2017-08-16T14:44:09Z

    [SPARK-21656][CORE] spark dynamic allocation should not idle timeout executors when tasks still to run
    
    ## What changes were proposed in this pull request?
    
    Right now spark lets go of executors when they are idle for the 60s (or configurable time). I have seen spark let them go when they are idle but they were really needed. I have seen this issue when the scheduler was waiting to get node locality but that takes longer than the default idle timeout. In these jobs the number of executors goes down really small (less than 10) but there are still like 80,000 tasks to run.
    We should consider not allowing executors to idle timeout if they are still needed according to the number of tasks to be run.
    
    ## How was this patch tested?
    
    Tested by manually adding executors to `executorsIdsToBeRemoved` list and seeing if those executors were removed when there are a lot of tasks and a high `numExecutorsTarget` value.
    
    Code used
    
    In  `ExecutorAllocationManager.start()`
    
    ```
        start_time = clock.getTimeMillis()
    ```
    
    In `ExecutorAllocationManager.schedule()`
    ```
        val executorIdsToBeRemoved = ArrayBuffer[String]()
        if ( now > start_time + 1000 * 60 * 2) {
          logInfo("--- REMOVING 1/2 of the EXECUTORS ---")
          start_time +=  1000 * 60 * 100
          var counter = 0
          for (x <- executorIds) {
            counter += 1
            if (counter == 2) {
              counter = 0
              executorIdsToBeRemoved += x
            }
          }
        }
    
    Author: John Lee <jl...@yahoo-inc.com>
    
    Closes #18874 from yoonlee95/SPARK-21656.
    
    (cherry picked from commit adf005dabe3b0060033e1eeaedbab31a868efc8c)
    Signed-off-by: Tom Graves <tg...@yahoo-inc.com>

commit 2a9697593add425efa15d51afb501b6236a78e26
Author: Wenchen Fan <we...@...>
Date:   2017-08-16T16:36:33Z

    [SPARK-18464][SQL][BACKPORT] support old table which doesn't store schema in table properties
    
    backport https://github.com/apache/spark/pull/18907 to branch 2.2
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #18963 from cloud-fan/backport.

commit fdea642dbd17d74c8bf136c1746159acaa937d25
Author: donnyzone <we...@...>
Date:   2017-08-18T05:37:32Z

    [SPARK-21739][SQL] Cast expression should initialize timezoneId when it is called statically to convert something into TimestampType
    
    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/projects/SPARK/issues/SPARK-21739
    
    This issue is caused by introducing TimeZoneAwareExpression.
    When the **Cast** expression converts something into TimestampType, it should be resolved with setting `timezoneId`. In general, it is resolved in LogicalPlan phase.
    
    However, there are still some places that use Cast expression statically to convert datatypes without setting `timezoneId`. In such cases,  `NoSuchElementException: None.get` will be thrown for TimestampType.
    
    This PR is proposed to fix the issue. We have checked the whole project and found two such usages(i.e., in`TableReader` and `HiveTableScanExec`).
    
    ## How was this patch tested?
    
    unit test
    
    Author: donnyzone <we...@gmail.com>
    
    Closes #18960 from DonnyZone/spark-21739.
    
    (cherry picked from commit 310454be3b0ce5ff6b6ef0070c5daadf6fb16927)
    Signed-off-by: gatorsmile <ga...@gmail.com>

commit 6c2a38a381f22029abd9ca4beab49b2473a13670
Author: Cédric Pelvet <ce...@...>
Date:   2017-08-20T10:05:54Z

    [MINOR] Correct validateAndTransformSchema in GaussianMixture and AFTSurvivalRegression
    
    ## What changes were proposed in this pull request?
    
    The line SchemaUtils.appendColumn(schema, $(predictionCol), IntegerType) did not modify the variable schema, hence only the last line had any effect. A temporary variable is used to correctly append the two columns predictionCol and probabilityCol.
    
    ## How was this patch tested?
    
    Manually.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: Cédric Pelvet <ce...@gmail.com>
    
    Closes #18980 from sharp-pixel/master.
    
    (cherry picked from commit 73e04ecc4f29a0fe51687ed1337c61840c976f89)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 0f640e96c9d0b0d95ac4bbcc84eaccefe7352f0f
Author: Liang-Chi Hsieh <vi...@...>
Date:   2017-08-20T16:45:23Z

    [SPARK-21721][SQL][FOLLOWUP] Clear FileSystem deleteOnExit cache when paths are successfully removed
    
    ## What changes were proposed in this pull request?
    
    Fix a typo in test.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    
    Closes #19005 from viirya/SPARK-21721-followup.
    
    (cherry picked from commit 28a6cca7df900d13613b318c07acb97a5722d2b8)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 526087f9ebca90f77f78d699c5f8d0243dd8ab61
Author: Marcelo Vanzin <va...@...>
Date:   2017-08-21T22:09:02Z

    [SPARK-21617][SQL] Store correct table metadata when altering schema in Hive metastore.
    
    For Hive tables, the current "replace the schema" code is the correct
    path, except that an exception in that path should result in an error, and
    not in retrying in a different way.
    
    For data source tables, Spark may generate a non-compatible Hive table;
    but for that to work with Hive 2.1, the detection of data source tables needs
    to be fixed in the Hive client, to also consider the raw tables used by code
    such as `alterTableSchema`.
    
    Tested with existing and added unit tests (plus internal tests with a 2.1 metastore).
    
    Author: Marcelo Vanzin <va...@cloudera.com>
    
    Closes #18849 from vanzin/SPARK-21617.
    
    (cherry picked from commit 84b5b16ea6c9816c70f7471a50eb5e4acb7fb1a1)
    Signed-off-by: gatorsmile <ga...@gmail.com>

commit 236b2f4d5a0879fc200f67b0af3a3c4f881ee98f
Author: Felix Cheung <fe...@...>
Date:   2017-08-24T04:35:17Z

    [SPARK-21805][SPARKR] Disable R vignettes code on Windows
    
    ## What changes were proposed in this pull request?
    
    Code in vignettes requires winutils on windows to run, when publishing to CRAN or building from source, winutils might not be available, so it's better to disable code run (so resulting vigenttes will not have output from code, but text is still there and code is still there)
    
    fix * checking re-building of vignette outputs ... WARNING
    and
    > %LOCALAPPDATA% not found. Please define the environment variable or restart and enter an installation path in localDir.
    
    ## How was this patch tested?
    
    jenkins, appveyor, r-hub
    
    before: https://artifacts.r-hub.io/SparkR_2.2.0.tar.gz-49cecef3bb09db1db130db31604e0293/SparkR.Rcheck/00check.log
    after: https://artifacts.r-hub.io/SparkR_2.2.0.tar.gz-86a066c7576f46794930ad114e5cff7c/SparkR.Rcheck/00check.log
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #19016 from felixcheung/rvigwind.
    
    (cherry picked from commit 43cbfad9992624d89bbb3209d1f5b765c7947bb9)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit a58536741f8365bb3fff01b588f3b42b219d11e5
Author: Wenchen Fan <we...@...>
Date:   2017-08-24T14:44:12Z

    [SPARK-21826][SQL] outer broadcast hash join should not throw NPE
    
    This is a bug introduced by https://github.com/apache/spark/pull/11274/files#diff-7adb688cbfa583b5711801f196a074bbL274 .
    
    Non-equal join condition should only be applied when the equal-join condition matches.
    
    regression test
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #19036 from cloud-fan/bug.
    
    (cherry picked from commit 2dd37d827f2e443dcb3eaf8a95437d179130d55c)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit 2b4bd7910fecc8b7b41c7d4388d2a8204c1901e8
Author: Weichen Xu <we...@...>
Date:   2017-08-24T17:18:56Z

    [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero (backport PR for 2.2)
    
    ## What changes were proposed in this pull request?
    
    This is backport PR of https://github.com/apache/spark/pull/18896
    
    fix bug of MLOR do not work correctly when featureStd contains zero
    
    We can reproduce the bug through such dataset (features including zero variance), will generate wrong result (all coefficients becomes 0)
    ```
        val multinomialDatasetWithZeroVar = {
          val nPoints = 100
          val coefficients = Array(
            -0.57997, 0.912083, -0.371077,
            -0.16624, -0.84355, -0.048509)
    
          val xMean = Array(5.843, 3.0)
          val xVariance = Array(0.6856, 0.0)  // including zero variance
    
          val testData = generateMultinomialLogisticInput(
            coefficients, xMean, xVariance, addIntercept = true, nPoints, seed)
    
          val df = sc.parallelize(testData, 4).toDF().withColumn("weight", lit(1.0))
          df.cache()
          df
        }
    ```
    ## How was this patch tested?
    
    testcase added.
    
    Author: WeichenXu <We...@outlook.com>
    
    Closes #19026 from WeichenXu123/fix_mlor_zero_var_bug_2_2.

commit 0d4ef2f690e378cade0a3ec84d535a535dc20dfc
Author: WeichenXu <we...@...>
Date:   2017-08-28T06:41:42Z

    [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.variance generate negative result
    
    Because of numerical error, MultivariateOnlineSummarizer.variance is possible to generate negative variance.
    
    **This is a serious bug because many algos in MLLib**
    **use stddev computed from** `sqrt(variance)`
    **it will generate NaN and crash the whole algorithm.**
    
    we can reproduce this bug use the following code:
    ```
        val summarizer1 = (new MultivariateOnlineSummarizer)
          .add(Vectors.dense(3.0), 0.7)
        val summarizer2 = (new MultivariateOnlineSummarizer)
          .add(Vectors.dense(3.0), 0.4)
        val summarizer3 = (new MultivariateOnlineSummarizer)
          .add(Vectors.dense(3.0), 0.5)
        val summarizer4 = (new MultivariateOnlineSummarizer)
          .add(Vectors.dense(3.0), 0.4)
    
        val summarizer = summarizer1
          .merge(summarizer2)
          .merge(summarizer3)
          .merge(summarizer4)
    
        println(summarizer.variance(0))
    ```
    This PR fix the bugs in `mllib.stat.MultivariateOnlineSummarizer.variance` and `ml.stat.SummarizerBuffer.variance`, and several places in `WeightedLeastSquares`
    
    test cases added.
    
    Author: WeichenXu <We...@outlook.com>
    
    Closes #19029 from WeichenXu123/fix_summarizer_var_bug.
    
    (cherry picked from commit 0456b4050817e64f27824720e695bbfff738d474)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 59bb7ebfb83c292cea853d6cd6fdf9748baa6ce2
Author: pgandhi <pg...@...>
Date:   2017-08-28T13:51:22Z

    [SPARK-21798] No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
    
    History Server Launch uses SparkClassCommandBuilder for launching the server. It is observed that SPARK_CLASSPATH has been removed and deprecated. For spark-submit this takes a different route and spark.driver.extraClasspath takes care of specifying additional jars in the classpath that were previously specified in the SPARK_CLASSPATH. Right now the only way specify the additional jars for launching daemons such as history server is using SPARK_DIST_CLASSPATH (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I presume is a distribution classpath. It would be nice to have a similar config like spark.driver.extraClasspath for launching daemons similar to history server.
    
    Added new environment variable SPARK_DAEMON_CLASSPATH to set classpath for launching daemons. Tested and verified for History Server and Standalone Mode.
    
    ## How was this patch tested?
    Initially, history server start script would fail for the reason being that it could not find the required jars for launching the server in the java classpath. Same was true for running Master and Worker in standalone mode. By adding the environment variable SPARK_DAEMON_CLASSPATH to the java classpath, both the daemons(History Server, Standalone daemons) are starting up and running.
    
    Author: pgandhi <pg...@yahoo-inc.com>
    Author: pgandhi999 <pa...@gmail.com>
    
    Closes #19047 from pgandhi999/master.
    
    (cherry picked from commit 24e6c187fbaa6874eedbdda6b3b5dc6ff9e1de36)
    Signed-off-by: Tom Graves <tg...@yahoo-inc.com>

commit 59529b21a99f3c4db16b31da9dc7fce62349cf11
Author: jerryshao <ss...@...>
Date:   2017-08-29T17:50:03Z

    [SPARK-21714][CORE][BACKPORT-2.2] Avoiding re-uploading remote resources in yarn client mode
    
    ## What changes were proposed in this pull request?
    
    This is a backport PR to fix issue of re-uploading remote resource in yarn client mode. The original PR is #18962.
    
    ## How was this patch tested?
    
    Tested in local UT.
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #19074 from jerryshao/SPARK-21714-2.2-backport.

commit 917fe6635891ea76b22a3bcba282040afd14651d
Author: Marcelo Vanzin <va...@...>
Date:   2017-08-29T19:51:27Z

    Revert "[SPARK-21714][CORE][BACKPORT-2.2] Avoiding re-uploading remote resources in yarn client mode"
    
    This reverts commit 59529b21a99f3c4db16b31da9dc7fce62349cf11.

commit a6a9944140bbb336146d0d868429cb01839375c7
Author: Dmitry Parfenchik <d....@...>
Date:   2017-08-30T08:42:15Z

    [SPARK-21254][WEBUI] History UI performance fixes
    
    ## This is a backport of PR #18783 to the latest released branch 2.2.
    
    ## What changes were proposed in this pull request?
    
    As described in JIRA ticket, History page is taking ~1min to load for cases when amount of jobs is 10k+.
    Most of the time is currently being spent on DOM manipulations and all additional costs implied by this (browser repaints and reflows).
    PR's goal is not to change any behavior but to optimize time of History UI rendering:
    
    1. The most costly operation is setting `innerHTML` for `duration` column within a loop, which is [extremely unperformant](https://jsperf.com/jquery-append-vs-html-list-performance/24). [Refactoring ](https://github.com/criteo-forks/spark/commit/b7e56eef4d66af977bd05af58a81e14faf33c211) this helped to get page load time **down to 10-15s**
    
    2. Second big gain bringing page load time **down to 4s** was [was achieved](https://github.com/criteo-forks/spark/commit/3630ca212baa94d60c5fe7e4109cf6da26288cec) by detaching table's DOM before parsing it with DataTables jQuery plugin.
    
    3. Another chunk of improvements ([1]https://github.com/criteo-forks/spark/commit/aeeeeb520d156a7293a707aa6bc053a2f83b9ac2), [2](https://github.com/criteo-forks/spark/commit/e25be9a66b018ba0cc53884f242469b515cb2bf4), [3](https://github.com/criteo-forks/spark/commit/91697079a29138b7581e64f2aa79247fa1a4e4af)) was focused on removing unnecessary DOM manipulations that in total contributed ~250ms to page load time.
    
    ## How was this patch tested?
    
    Tested by existing Selenium tests in `org.apache.spark.deploy.history.HistoryServerSuite`.
    
    Changes were also tested on Criteo's spark-2.1 fork with 20k+ number of rows in the table, reducing load time to 4s.
    
    Author: Dmitry Parfenchik <d....@criteo.com>
    
    Closes #18860 from 2ooom/history-ui-perf-fix-2.2.

commit d10c9dc3f631a26dbbbd8f5c601ca2001a5d7c80
Author: jerryshao <ss...@...>
Date:   2017-08-30T19:30:24Z

    [SPARK-21714][CORE][BACKPORT-2.2] Avoiding re-uploading remote resources in yarn client mode
    
    ## What changes were proposed in this pull request?
    
    This is a backport PR to fix issue of re-uploading remote resource in yarn client mode. The original PR is #18962.
    
    ## How was this patch tested?
    
    Tested in local UT.
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #19074 from jerryshao/SPARK-21714-2.2-backport.

commit 14054ffc5fd3399d04d69e26efb31d8b24b60bdc
Author: Sital Kedia <sk...@...>
Date:   2017-08-30T21:19:13Z

    [SPARK-21834] Incorrect executor request in case of dynamic allocation
    
    ## What changes were proposed in this pull request?
    
    killExecutor api currently does not allow killing an executor without updating the total number of executors needed. In case of dynamic allocation is turned on and the allocator tries to kill an executor, the scheduler reduces the total number of executors needed ( see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) which is incorrect because the allocator already takes care of setting the required number of executors itself.
    
    ## How was this patch tested?
    
    Ran a job on the cluster and made sure the executor request is correct
    
    Author: Sital Kedia <sk...@fb.com>
    
    Closes #19081 from sitalkedia/skedia/oss_fix_executor_allocation.
    
    (cherry picked from commit 6949a9c5c6120fdde1b63876ede661adbd1eb15e)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit 50f86e1fe2aad67e4472b24d910ea519b9ad746f
Author: gatorsmile <ga...@...>
Date:   2017-09-01T20:48:50Z

    [SPARK-21884][SPARK-21477][BACKPORT-2.2][SQL] Mark LocalTableScanExec's input data transient
    
    This PR is to backport https://github.com/apache/spark/pull/18686 for resolving the issue in https://github.com/apache/spark/pull/19094
    
    ---
    
    ## What changes were proposed in this pull request?
    This PR is to mark the parameter `rows` and `unsafeRow` of LocalTableScanExec transient. It can avoid serializing the unneeded objects.
    
    ## How was this patch tested?
    N/A
    
    Author: gatorsmile <ga...@gmail.com>
    
    Closes #19101 from gatorsmile/backport-21477.

commit fb1b5f08adaf4ec7c786b7a8b6283b62683f1324
Author: Sean Owen <so...@...>
Date:   2017-09-04T21:02:59Z

    [SPARK-21418][SQL] NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true
    
    ## What changes were proposed in this pull request?
    
    If no SparkConf is available to Utils.redact, simply don't redact.
    
    ## How was this patch tested?
    
    Existing tests
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #19123 from srowen/SPARK-21418.
    
    (cherry picked from commit ca59445adb30ed796189532df2a2898ecd33db68)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22118: Branch 2.2

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22118


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22118: Branch 2.2

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22118
  
    @speful looks mistakenly open. mind closing this please?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22118: Branch 2.2

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22118
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22118: Branch 2.2

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22118
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22118: Branch 2.2

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22118
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org