You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rameshch16 <gi...@git.apache.org> on 2018/04/02 06:30:32 UTC
[GitHub] spark pull request #20957: Branch 2.3
GitHub user rameshch16 opened a pull request:
https://github.com/apache/spark/pull/20957
Branch 2.3
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-2.3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20957.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20957
----
commit acf3b70d16cc4d2416b4ce3f42b3cf95836170ed
Author: Tathagata Das <ta...@...>
Date: 2018-01-19T00:29:45Z
[SPARK-23142][SS][DOCS] Added docs for continuous processing
## What changes were proposed in this pull request?
Added documentation for continuous processing. Modified two locations.
- Modified the overview to have a mention of Continuous Processing.
- Added a new section on Continuous Processing at the end.
![image](https://user-images.githubusercontent.com/663212/35083551-a3dd23f6-fbd4-11e7-9e7e-90866f131ca9.png)
![image](https://user-images.githubusercontent.com/663212/35083618-d844027c-fbd4-11e7-9fde-75992cc517bd.png)
## How was this patch tested?
N/A
Author: Tathagata Das <ta...@gmail.com>
Closes #20308 from tdas/SPARK-23142.
(cherry picked from commit 4cd2ecc0c7222fef1337e04f1948333296c3be86)
Signed-off-by: Tathagata Das <ta...@gmail.com>
commit 225b1afdd1582cd4087e7cb98834505eaf16743e
Author: brandonJY <br...@...>
Date: 2018-01-19T00:57:49Z
[DOCS] change to dataset for java code in structured-streaming-kafka-integration document
## What changes were proposed in this pull request?
In latest structured-streaming-kafka-integration document, Java code example for Kafka integration is using `DataFrame<Row>`, shouldn't it be changed to `DataSet<Row>`?
## How was this patch tested?
manual test has been performed to test the updated example Java code in Spark 2.2.1 with Kafka 1.0
Author: brandonJY <br...@users.noreply.github.com>
Closes #20312 from brandonJY/patch-2.
(cherry picked from commit 6121e91b7f5c9513d68674e4d5edbc3a4a5fd5fd)
Signed-off-by: Sean Owen <so...@cloudera.com>
commit 541dbc00b24f17d83ea2531970f2e9fe57fe3718
Author: Takuya UESHIN <ue...@...>
Date: 2018-01-19T03:37:08Z
[SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType casting when casting PythonUserDefinedType to String.
## What changes were proposed in this pull request?
This is a follow-up of #20246.
If a UDT in Python doesn't have its corresponding Scala UDT, cast to string will be the raw string of the internal value, e.g. `"org.apache.spark.sql.catalyst.expressions.UnsafeArrayDataxxxxxxxx"` if the internal type is `ArrayType`.
This pr fixes it by using its `sqlType` casting.
## How was this patch tested?
Added a test and existing tests.
Author: Takuya UESHIN <ue...@databricks.com>
Closes #20306 from ueshin/issues/SPARK-23054/fup1.
(cherry picked from commit 568055da93049c207bb830f244ff9b60c638837c)
Signed-off-by: Wenchen Fan <we...@databricks.com>
commit 54c1fae12df654c7713ac5e7eb4da7bb2f785401
Author: Sameer Agarwal <sa...@...>
Date: 2018-01-19T09:38:08Z
[BUILD][MINOR] Fix java style check issues
## What changes were proposed in this pull request?
This patch fixes a few recently introduced java style check errors in master and release branch.
As an aside, given that [java linting currently fails](https://github.com/apache/spark/pull/10763
) on machines with a clean maven cache, it'd be great to find another workaround to [re-enable the java style checks](https://github.com/apache/spark/blob/3a07eff5af601511e97a05e6fea0e3d48f74c4f0/dev/run-tests.py#L577) as part of Spark PRB.
/cc zsxwing JoshRosen srowen for any suggestions
## How was this patch tested?
Manual Check
Author: Sameer Agarwal <sa...@apache.org>
Closes #20323 from sameeragarwal/java.
(cherry picked from commit 9c4b99861cda3f9ec44ca8c1adc81a293508190c)
Signed-off-by: Sameer Agarwal <sa...@apache.org>
commit e58223171ecae6450482aadf4e7994c3b8d8a58d
Author: Nick Pentreath <ni...@...>
Date: 2018-01-19T10:43:23Z
[SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols parameter
Update user guide entry for `FeatureHasher` to match the Scala / Python doc, to describe the `categoricalCols` parameter.
## How was this patch tested?
Doc only
Author: Nick Pentreath <ni...@za.ibm.com>
Closes #20293 from MLnick/SPARK-23127-catCol-userguide.
(cherry picked from commit 60203fca6a605ad158184e1e0ce5187e144a3ea7)
Signed-off-by: Nick Pentreath <ni...@za.ibm.com>
commit ef7989d55b65f386ed1ab87535a44e9367029a52
Author: Liang-Chi Hsieh <vi...@...>
Date: 2018-01-19T10:48:42Z
[SPARK-23048][ML] Add OneHotEncoderEstimator document and examples
## What changes were proposed in this pull request?
We have `OneHotEncoderEstimator` now and `OneHotEncoder` will be deprecated since 2.3.0. We should add `OneHotEncoderEstimator` into mllib document.
We also need to provide corresponding examples for `OneHotEncoderEstimator` which are used in the document too.
## How was this patch tested?
Existing tests.
Author: Liang-Chi Hsieh <vi...@gmail.com>
Closes #20257 from viirya/SPARK-23048.
(cherry picked from commit b74366481cc87490adf4e69d26389ec737548c15)
Signed-off-by: Nick Pentreath <ni...@za.ibm.com>
commit b7a81999df8f43223403c77db9c1aedddb58370d
Author: Marco Gaido <ma...@...>
Date: 2018-01-19T11:46:48Z
[SPARK-23089][STS] Recreate session log directory if it doesn't exist
## What changes were proposed in this pull request?
When creating a session directory, Thrift should create the parent directory (i.e. /tmp/base_session_log_dir) if it is not present. It is common that many tools delete empty directories, so the directory may be deleted. This can cause the session log to be disabled.
This was fixed in HIVE-12262: this PR brings it in Spark too.
## How was this patch tested?
manual tests
Author: Marco Gaido <ma...@gmail.com>
Closes #20281 from mgaido91/SPARK-23089.
(cherry picked from commit e41400c3c8aace9eb72e6134173f222627fb0faf)
Signed-off-by: Wenchen Fan <we...@databricks.com>
commit 8d6845cf926a14e21ca29a43f2cc9a3a9475afd5
Author: gatorsmile <ga...@...>
Date: 2018-01-19T14:47:18Z
[SPARK-23000][TEST] Keep Derby DB Location Unchanged After Session Cloning
## What changes were proposed in this pull request?
After session cloning in `TestHive`, the conf of the singleton SparkContext for derby DB location is changed to a new directory. The new directory is created in `HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false)`.
This PR is to keep the conf value of `ConfVars.METASTORECONNECTURLKEY.varname` unchanged during the session clone.
## How was this patch tested?
The issue can be reproduced by the command:
> build/sbt -Phive "hive/test-only org.apache.spark.sql.hive.HiveSessionStateSuite org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite"
Also added a test case.
Author: gatorsmile <ga...@gmail.com>
Closes #20328 from gatorsmile/fixTestFailure.
(cherry picked from commit 6c39654efcb2aa8cb4d082ab7277a6fa38fb48e4)
Signed-off-by: Wenchen Fan <we...@databricks.com>
commit 55efeffd774a776806f379df5b2209af05270cc4
Author: Wenchen Fan <we...@...>
Date: 2018-01-19T16:58:21Z
[SPARK-23149][SQL] polish ColumnarBatch
## What changes were proposed in this pull request?
Several cleanups in `ColumnarBatch`
* remove `schema`. The `ColumnVector`s inside `ColumnarBatch` already have the data type information, we don't need this `schema`.
* remove `capacity`. `ColumnarBatch` is just a wrapper of `ColumnVector`s, not builders, it doesn't need a capacity property.
* remove `DEFAULT_BATCH_SIZE`. As a wrapper, `ColumnarBatch` can't decide the batch size, it should be decided by the reader, e.g. parquet reader, orc reader, cached table reader. The default batch size should also be defined by the reader.
## How was this patch tested?
existing tests.
Author: Wenchen Fan <we...@databricks.com>
Closes #20316 from cloud-fan/columnar-batch.
(cherry picked from commit d8aaa771e249b3f54b57ce24763e53fd65a0dbf7)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit ffe45913d0c666185f8c252be30b5e269a909c07
Author: foxish <ra...@...>
Date: 2018-01-19T18:23:13Z
[SPARK-23104][K8S][DOCS] Changes to Kubernetes scheduler documentation
## What changes were proposed in this pull request?
Docs changes:
- Adding a warning that the backend is experimental.
- Removing a defunct internal-only option from documentation
- Clarifying that node selectors can be used right away, and other minor cosmetic changes
## How was this patch tested?
Docs only change
Author: foxish <ra...@google.com>
Closes #20314 from foxish/ambiguous-docs.
(cherry picked from commit 73d3b230f3816a854a181c0912d87b180e347271)
Signed-off-by: Marcelo Vanzin <va...@cloudera.com>
commit 4b79514c90ca76674d17fd80d125e9dbfb0e845e
Author: Marcelo Vanzin <va...@...>
Date: 2018-01-19T19:26:37Z
[SPARK-20664][CORE] Delete stale application data from SHS.
Detect the deletion of event log files from storage, and remove
data about the related application attempt in the SHS.
Also contains code to fix SPARK-21571 based on code by ericvandenbergfb.
Author: Marcelo Vanzin <va...@cloudera.com>
Closes #20138 from vanzin/SPARK-20664.
(cherry picked from commit fed2139f053fac4a9a6952ff0ab1cc2a5f657bd0)
Signed-off-by: Imran Rashid <ir...@cloudera.com>
commit d0cb19873bb325be7e31de62b0ba117dd6b92619
Author: Marcelo Vanzin <va...@...>
Date: 2018-01-19T19:32:20Z
[SPARK-23103][CORE] Ensure correct sort order for negative values in LevelDB.
The code was sorting "0" as "less than" negative values, which is a little
wrong. Fix is simple, most of the changes are the added test and related
cleanup.
Author: Marcelo Vanzin <va...@cloudera.com>
Closes #20284 from vanzin/SPARK-23103.
(cherry picked from commit aa3a1276f9e23ffbb093d00743e63cd4369f9f57)
Signed-off-by: Imran Rashid <ir...@cloudera.com>
commit f9ad00a5aeeecf4b8d261a0dae6c8cb6be8daa67
Author: Marcelo Vanzin <va...@...>
Date: 2018-01-19T21:14:24Z
[SPARK-23135][UI] Fix rendering of accumulators in the stage page.
This follows the behavior of 2.2: only named accumulators with a
value are rendered.
Screenshot:
![accs](https://user-images.githubusercontent.com/1694083/35065700-df409114-fb82-11e7-87c1-550c3f674371.png)
Author: Marcelo Vanzin <va...@cloudera.com>
Closes #20299 from vanzin/SPARK-23135.
(cherry picked from commit f6da41b0150725fe96ccb2ee3b48840b207f47eb)
Signed-off-by: Sameer Agarwal <sa...@apache.org>
commit c647f918b1aee27d7a53852aca74629f03ad49f6
Author: Kent Yao <11...@...>
Date: 2018-01-19T23:49:29Z
[SPARK-21771][SQL] remove useless hive client in SparkSQLEnv
## What changes were proposed in this pull request?
Once a meta hive client is created, it generates its SessionState which creates a lot of session related directories, some deleteOnExit, some does not. if a hive client is useless we may not create it at the very start.
## How was this patch tested?
N/A
cc hvanhovell cloud-fan
Author: Kent Yao <11...@zju.edu.cn>
Closes #18983 from yaooqinn/patch-1.
(cherry picked from commit 793841c6b8b98b918dcf241e29f60ef125914db9)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit 0cde5212a80b5572bfe53b06ed557e6c2ec8c903
Author: Sean Owen <so...@...>
Date: 2018-01-20T06:46:34Z
[SPARK-23091][ML] Incorrect unit test for approxQuantile
## What changes were proposed in this pull request?
Narrow bound on approx quantile test to epsilon from 2*epsilon to match paper
## How was this patch tested?
Existing tests.
Author: Sean Owen <so...@cloudera.com>
Closes #20324 from srowen/SPARK-23091.
(cherry picked from commit 396cdfbea45232bacbc03bfaf8be4ea85d47d3fd)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit e11d5eaf79ffccbe3a5444a5b9ecf3a203e1fc90
Author: Shashwat Anand <me...@...>
Date: 2018-01-20T22:34:37Z
[SPARK-23165][DOC] Spelling mistake fix in quick-start doc.
## What changes were proposed in this pull request?
Fix spelling in quick-start doc.
## How was this patch tested?
Doc only.
Author: Shashwat Anand <me...@shashwat.me>
Closes #20336 from ashashwat/SPARK-23165.
(cherry picked from commit 84a076e0e9a38a26edf7b702c24fdbbcf1e697b9)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit b9c1367b7d9240070c5d83572dc7b43c7480b456
Author: fjh100456 <fu...@...>
Date: 2018-01-20T22:49:49Z
[SPARK-21786][SQL] The 'spark.sql.parquet.compression.codec' and 'spark.sql.orc.compression.codec' configuration doesn't take effect on hive table writing
[SPARK-21786][SQL] The 'spark.sql.parquet.compression.codec' and 'spark.sql.orc.compression.codec' configuration doesn't take effect on hive table writing
What changes were proposed in this pull request?
Pass ‘spark.sql.parquet.compression.codec’ value to ‘parquet.compression’.
Pass ‘spark.sql.orc.compression.codec’ value to ‘orc.compress’.
How was this patch tested?
Add test.
Note:
This is the same issue mentioned in #19218 . That branch was deleted mistakenly, so make a new pr instead.
gatorsmile maropu dongjoon-hyun discipleforteen
Author: fjh100456 <fu...@zte.com.cn>
Author: Takeshi Yamamuro <ya...@apache.org>
Author: Wenchen Fan <we...@databricks.com>
Author: gatorsmile <ga...@gmail.com>
Author: Yinan Li <li...@gmail.com>
Author: Marcelo Vanzin <va...@cloudera.com>
Author: Juliusz Sompolski <ju...@databricks.com>
Author: Felix Cheung <fe...@hotmail.com>
Author: jerryshao <ss...@hortonworks.com>
Author: Li Jin <ic...@gmail.com>
Author: Gera Shegalov <ge...@apache.org>
Author: chetkhatri <ck...@gmail.com>
Author: Joseph K. Bradley <jo...@databricks.com>
Author: Bago Amirbekian <ba...@databricks.com>
Author: Xianjin YE <ad...@gmail.com>
Author: Bruce Robbins <be...@gmail.com>
Author: zuotingbing <zu...@zte.com.cn>
Author: Kent Yao <ya...@hotmail.com>
Author: hyukjinkwon <gu...@gmail.com>
Author: Adrian Ionescu <ad...@databricks.com>
Closes #20087 from fjh100456/HiveTableWriting.
(cherry picked from commit 00d169156d4b1c91d2bcfd788b254b03c509dc41)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit e0ef30f770329f058843a7a486bf357e9cd6e26a
Author: Marco Gaido <ma...@...>
Date: 2018-01-21T06:39:49Z
[SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition is false/null
## What changes were proposed in this pull request?
CheckCartesianProduct raises an AnalysisException also when the join condition is always false/null. In this case, we shouldn't raise it, since the result will not be a cartesian product.
## How was this patch tested?
added UT
Author: Marco Gaido <ma...@gmail.com>
Closes #20333 from mgaido91/SPARK-23087.
(cherry picked from commit 121dc96f088a7b157d5b2cffb626b0e22d1fc052)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit 7520491bf80eb2e21f0630aa13d7cdaad881626b
Author: Felix Cheung <fe...@...>
Date: 2018-01-21T19:23:51Z
[SPARK-21293][SS][SPARKR] Add doc example for streaming join, dedup
## What changes were proposed in this pull request?
streaming programming guide changes
## How was this patch tested?
manually
Author: Felix Cheung <fe...@hotmail.com>
Closes #20340 from felixcheung/rstreamdoc.
(cherry picked from commit 2239d7a410e906ccd40aa8e84d637e9d06cd7b8a)
Signed-off-by: Felix Cheung <fe...@apache.org>
commit 5781fa79e28e2123e370fc1096488e318f2b4ee2
Author: Russell Spitzer <ru...@...>
Date: 2018-01-22T04:27:51Z
[SPARK-22976][CORE] Cluster mode driver dir removed while running
## What changes were proposed in this pull request?
The clean up logic on the worker perviously determined the liveness of a
particular applicaiton based on whether or not it had running executors.
This would fail in the case that a directory was made for a driver
running in cluster mode if that driver had no running executors on the
same machine. To preserve driver directories we consider both executors
and running drivers when checking directory liveness.
## How was this patch tested?
Manually started up two node cluster with a single core on each node. Turned on worker directory cleanup and set the interval to 1 second and liveness to one second. Without the patch the driver directory is removed immediately after the app is launched. With the patch it is not
### Without Patch
```
INFO 2018-01-05 23:48:24,693 Logging.scala:54 - Asked to launch driver driver-20180105234824-0000
INFO 2018-01-05 23:48:25,293 Logging.scala:54 - Changing view acls to: cassandra
INFO 2018-01-05 23:48:25,293 Logging.scala:54 - Changing modify acls to: cassandra
INFO 2018-01-05 23:48:25,294 Logging.scala:54 - Changing view acls groups to:
INFO 2018-01-05 23:48:25,294 Logging.scala:54 - Changing modify acls groups to:
INFO 2018-01-05 23:48:25,294 Logging.scala:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cassandra); groups with view permissions: Set(); users with modify permissions: Set(cassandra); groups with modify permissions: Set()
INFO 2018-01-05 23:48:25,330 Logging.scala:54 - Copying user jar file:/home/automaton/writeRead-0.1.jar to /var/lib/spark/worker/driver-20180105234824-0000/writeRead-0.1.jar
INFO 2018-01-05 23:48:25,332 Logging.scala:54 - Copying /home/automaton/writeRead-0.1.jar to /var/lib/spark/worker/driver-20180105234824-0000/writeRead-0.1.jar
INFO 2018-01-05 23:48:25,361 Logging.scala:54 - Launch Command: "/usr/lib/jvm/jdk1.8.0_40//bin/java" ....
****
INFO 2018-01-05 23:48:56,577 Logging.scala:54 - Removing directory: /var/lib/spark/worker/driver-20180105234824-0000 ### << Cleaned up
****
--
One minute passes while app runs (app has 1 minute sleep built in)
--
WARN 2018-01-05 23:49:58,080 ShuffleSecretManager.java:73 - Attempted to unregister application app-20180105234831-0000 when it is not registered
INFO 2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - Application app-20180105234831-0000 removed, cleanupLocalDirs = false
INFO 2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - Application app-20180105234831-0000 removed, cleanupLocalDirs = false
INFO 2018-01-05 23:49:58,082 ExternalShuffleBlockResolver.java:163 - Application app-20180105234831-0000 removed, cleanupLocalDirs = true
INFO 2018-01-05 23:50:00,999 Logging.scala:54 - Driver driver-20180105234824-0000 exited successfully
```
With Patch
```
INFO 2018-01-08 23:19:54,603 Logging.scala:54 - Asked to launch driver driver-20180108231954-0002
INFO 2018-01-08 23:19:54,975 Logging.scala:54 - Changing view acls to: automaton
INFO 2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls to: automaton
INFO 2018-01-08 23:19:54,976 Logging.scala:54 - Changing view acls groups to:
INFO 2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls groups to:
INFO 2018-01-08 23:19:54,976 Logging.scala:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(automaton); groups with view permissions: Set(); users with modify permissions: Set(automaton); groups with modify permissions: Set()
INFO 2018-01-08 23:19:55,029 Logging.scala:54 - Copying user jar file:/home/automaton/writeRead-0.1.jar to /var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
INFO 2018-01-08 23:19:55,031 Logging.scala:54 - Copying /home/automaton/writeRead-0.1.jar to /var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
INFO 2018-01-08 23:19:55,038 Logging.scala:54 - Launch Command: ......
INFO 2018-01-08 23:21:28,674 ShuffleSecretManager.java:69 - Unregistered shuffle secret for application app-20180108232000-0000
INFO 2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - Application app-20180108232000-0000 removed, cleanupLocalDirs = false
INFO 2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - Application app-20180108232000-0000 removed, cleanupLocalDirs = false
INFO 2018-01-08 23:21:28,681 ExternalShuffleBlockResolver.java:163 - Application app-20180108232000-0000 removed, cleanupLocalDirs = true
INFO 2018-01-08 23:21:31,703 Logging.scala:54 - Driver driver-20180108231954-0002 exited successfully
*****
INFO 2018-01-08 23:21:32,346 Logging.scala:54 - Removing directory: /var/lib/spark/worker/driver-20180108231954-0002 ### < Happening AFTER the Run completes rather than during it
*****
```
Author: Russell Spitzer <Ru...@gmail.com>
Closes #20298 from RussellSpitzer/SPARK-22976-master.
(cherry picked from commit 11daeb833222b1cd349fb1410307d64ab33981db)
Signed-off-by: jerryshao <ss...@hortonworks.com>
commit 36af73b59b6fb3d5f8e8a8e1caf44bd565e97b3d
Author: Dongjoon Hyun <do...@...>
Date: 2018-01-22T06:18:57Z
[MINOR][SQL] Fix wrong comments on org.apache.spark.sql.parquet.row.attributes
## What changes were proposed in this pull request?
This PR fixes the wrong comment on `org.apache.spark.sql.parquet.row.attributes`
which is useful for UDTs like Vector/Matrix. Please see [SPARK-22320](https://issues.apache.org/jira/browse/SPARK-22320) for the usage.
Originally, [SPARK-19411](https://github.com/apache/spark/commit/bf493686eb17006727b3ec81849b22f3df68fdef#diff-ee26d4c4be21e92e92a02e9f16dbc285L314) left this behind during removing optional column metadatas. In the same PR, the same comment was removed at line 310-311.
## How was this patch tested?
N/A (This is about comments).
Author: Dongjoon Hyun <do...@apache.org>
Closes #20346 from dongjoon-hyun/minor_comment_parquet.
(cherry picked from commit 8142a3b883a5fe6fc620a2c5b25b6bde4fda32e5)
Signed-off-by: hyukjinkwon <gu...@gmail.com>
commit 57c320a0dcc6ca784331af0191438e252d418075
Author: Marcelo Vanzin <va...@...>
Date: 2018-01-22T06:49:12Z
[SPARK-23020][CORE] Fix races in launcher code, test.
The race in the code is because the handle might update
its state to the wrong state if the connection handling
thread is still processing incoming data; so the handle
needs to wait for the connection to finish up before
checking the final state.
The race in the test is because when waiting for a handle
to reach a final state, the waitFor() method needs to wait
until all handle state is updated (which also includes
waiting for the connection thread above to finish).
Otherwise, waitFor() may return too early, which would cause
a bunch of different races (like the listener not being yet
notified of the state change, or being in the middle of
being notified, or the handle not being properly disposed
and causing postChecks() to assert).
On top of that I found, by code inspection, a couple of
potential races that could make a handle end up in the
wrong state when being killed.
The original version of this fix introduced the flipped
version of the first race described above; the connection
closing might override the handle state before the
handle might have a chance to do cleanup. The fix there
is to only dispose of the handle from the connection
when there is an error, and let the handle dispose
itself in the normal case.
The fix also caused a bug in YarnClusterSuite to be surfaced;
the code was checking for a file in the classpath that was
not expected to be there in client mode. Because of the above
issues, the error was not propagating correctly and the (buggy)
test was incorrectly passing.
Tested by running the existing unit tests a lot (and not
seeing the errors I was seeing before).
Author: Marcelo Vanzin <va...@cloudera.com>
Closes #20297 from vanzin/SPARK-23020.
(cherry picked from commit ec228976156619ed8df21a85bceb5fd3bdeb5855)
Signed-off-by: Wenchen Fan <we...@databricks.com>
commit cf078a205a14d8709e2c4a9d9f23f6efa20b4fe7
Author: Arseniy Tashoyan <ta...@...>
Date: 2018-01-22T12:17:05Z
[MINOR][DOC] Fix the path to the examples jar
## What changes were proposed in this pull request?
The example jar file is now in ./examples/jars directory of Spark distribution.
Author: Arseniy Tashoyan <ta...@users.noreply.github.com>
Closes #20349 from tashoyan/patch-1.
(cherry picked from commit 60175e959f275d2961798fbc5a9150dac9de51ff)
Signed-off-by: jerryshao <ss...@hortonworks.com>
commit 743b9173f8feaed8e594961aa85d61fb3f8e5e70
Author: gatorsmile <ga...@...>
Date: 2018-01-22T12:27:59Z
[SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration
## What changes were proposed in this pull request?
This PR is to update the docs for UDF registration
## How was this patch tested?
N/A
Author: gatorsmile <ga...@gmail.com>
Closes #20348 from gatorsmile/testUpdateDoc.
(cherry picked from commit 73281161fc7fddd645c712986ec376ac2b1bd213)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit d933fcea6f3b1d2a5bfb03d808ec83db0f97298a
Author: gatorsmile <ga...@...>
Date: 2018-01-22T12:31:24Z
[SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules
## What changes were proposed in this pull request?
Dump the statistics of effective runs of analyzer and optimizer rules.
## How was this patch tested?
Do a manual run of TPCDSQuerySuite
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 175899
Total time: 25.486559948 seconds
Rule Effective Time / Total Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.optimizer.ColumnPruning 1603280450 / 2868461549 761 / 1877
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 2045860009 / 2056602674 37 / 788
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 440719059 / 1693110949 38 / 1982
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries 1429834919 / 1446016225 39 / 285
org.apache.spark.sql.catalyst.optimizer.PruneFilters 33273083 / 1389586938 3 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 821183615 / 1266668754 616 / 1982
org.apache.spark.sql.catalyst.optimizer.ReorderJoin 775837028 / 866238225 132 / 1592
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 550683593 / 748854507 211 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 513075345 / 634370596 49 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 33475731 / 606406532 12 / 742
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 193144298 / 545403925 86 / 1982
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 18651497 / 495725004 7 / 1592
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin 369257217 / 489934378 709 / 1592
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases 3707000 / 468291609 9 / 1592
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 410155900 / 435254175 192 / 285
org.apache.spark.sql.execution.datasources.FindDataSourceTable 348885539 / 371855866 233 / 1982
org.apache.spark.sql.catalyst.optimizer.NullPropagation 11307645 / 307531225 26 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 120324545 / 304948785 294 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 92323199 / 286695007 38 / 1982
org.apache.spark.sql.catalyst.optimizer.PushDownPredicate 230084193 / 265845972 785 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 45938401 / 265144009 40 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 14888776 / 261499450 1 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 113796384 / 244913861 29 / 1982
org.apache.spark.sql.catalyst.optimizer.ConstantFolding 65008069 / 236548480 126 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 226338929 0 / 1982
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 98134906 / 221323770 417 / 1982
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator 0 / 208421703 0 / 1592
org.apache.spark.sql.catalyst.optimizer.OptimizeIn 8762534 / 199351958 16 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 11980016 / 190779046 27 / 1982
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison 0 / 188887385 0 / 1592
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals 0 / 186812106 0 / 1592
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions 0 / 183885230 0 / 1592
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts 17128295 / 182901910 69 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division 14579110 / 180309340 8 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 0 / 176740516 0 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion 0 / 170781986 0 / 1982
org.apache.spark.sql.catalyst.optimizer.LikeSimplification 771605 / 164136736 1 / 1592
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions 0 / 155958962 0 / 1592
org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct 0 / 151222943 0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 7534632 / 146596355 14 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 0 / 144488654 0 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion 0 / 142403338 0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 12067635 / 141500665 21 / 1982
org.apache.spark.sql.catalyst.analysis.TimeWindowing 0 / 140431958 0 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 0 / 125471960 0 / 1982
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin 14226972 / 124922019 11 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion 0 / 123613887 0 / 1982
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery 8491071 / 121179056 7 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics 55526073 / 120290529 11 / 1982
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation 0 / 113886790 0 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer 52383759 / 107160222 148 / 1982
org.apache.spark.sql.catalyst.analysis.CleanupAliases 52543524 / 102091518 344 / 1086
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantProject 40682895 / 94403652 342 / 1877
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions 38473816 / 89740578 23 / 1982
org.apache.spark.sql.catalyst.optimizer.CollapseProject 46806090 / 83315506 281 / 1877
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation 0 / 78750087 0 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases 13742765 / 77227258 47 / 1982
org.apache.spark.sql.catalyst.optimizer.CombineFilters 53386729 / 76960344 448 / 1592
org.apache.spark.sql.execution.datasources.DataSourceAnalysis 68034341 / 75724186 24 / 742
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 0 / 71151084 0 / 750
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences 12139848 / 67599140 8 / 1982
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates 45017938 / 65968777 23 / 285
org.apache.spark.sql.execution.datasources.v2.PushDownOperatorsToDataSource 0 / 60937767 0 / 285
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition 0 / 59897237 0 / 1592
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion 8547262 / 53941370 10 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF 0 / 52735976 0 / 742
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes 9797713 / 52401665 9 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic 0 / 51741500 0 / 742
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 28614911 / 51061186 233 / 1990
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions 0 / 50621510 0 / 285
org.apache.spark.sql.catalyst.optimizer.CombineUnions 2777800 / 50262112 17 / 1877
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates 1640641 / 49633909 46 / 1982
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates 20198374 / 48488419 100 / 385
org.apache.spark.sql.catalyst.optimizer.LimitPushDown 0 / 45052523 0 / 1592
org.apache.spark.sql.catalyst.optimizer.CombineLimits 0 / 44719443 0 / 1592
org.apache.spark.sql.catalyst.optimizer.EliminateSorts 0 / 44216930 0 / 1592
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery 36235699 / 44165786 148 / 285
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance 0 / 42750307 0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 0 / 41811748 0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy 3819476 / 41776562 4 / 1982
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime 0 / 40527808 0 / 285
org.apache.spark.sql.catalyst.optimizer.CollapseWindow 0 / 36832538 0 / 1592
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization 0 / 36120667 0 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy 0 / 32435826 0 / 1982
org.apache.spark.sql.execution.datasources.PreprocessTableCreation 0 / 32145218 0 / 742
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile 0 / 30295614 0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot 0 / 30111655 0 / 1982
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions 59930 / 28038201 26 / 8280
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables 0 / 27808108 0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases 0 / 27066690 0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate 0 / 26660210 0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin 0 / 25255184 0 / 1982
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions 0 / 24663088 0 / 1990
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals 9709079 / 24450670 4 / 788
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints 0 / 23776535 0 / 750
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions 0 / 22697895 0 / 285
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts 0 / 22523798 0 / 285
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate 988593 / 21535410 15 / 300
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects 0 / 20269996 0 / 285
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates 0 / 19388592 0 / 285
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases 17675532 / 18971185 215 / 285
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabase 0 / 18271152 0 / 285
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation 2077097 / 17190855 3 / 288
org.apache.spark.sql.catalyst.analysis.EliminateBarriers 0 / 16736359 0 / 1086
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery 0 / 16669341 0 / 285
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences 0 / 14470235 0 / 742
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin 6715625 / 12190561 1 / 300
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin 3451793 / 11431432 7 / 300
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate 0 / 10810568 0 / 285
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions 344198 / 10475276 1 / 286
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution 0 / 10386630 0 / 788
org.apache.spark.sql.catalyst.analysis.EliminateUnions 0 / 10096526 0 / 788
org.apache.spark.sql.catalyst.analysis.AliasViewChild 0 / 9991706 0 / 742
org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation 0 / 9649334 0 / 288
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints 0 / 8739109 0 / 750
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion 0 / 8420889 0 / 742
org.apache.spark.sql.catalyst.analysis.EliminateView 0 / 8319134 0 / 285
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions 0 / 7392627 0 / 286
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter 0 / 7170516 0 / 300
org.apache.spark.sql.catalyst.optimizer.SimplifyCreateArrayOps 0 / 7109643 0 / 1592
org.apache.spark.sql.catalyst.optimizer.SimplifyCreateStructOps 0 / 6837590 0 / 1592
org.apache.spark.sql.catalyst.optimizer.SimplifyCreateMapOps 0 / 6617848 0 / 1592
org.apache.spark.sql.catalyst.optimizer.CombineConcats 0 / 5768406 0 / 1592
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate 0 / 5349831 0 / 285
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters 0 / 5186642 0 / 285
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct 0 / 2427686 0 / 285
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder 0 / 2420436 0 / 285
```
Author: gatorsmile <ga...@gmail.com>
Closes #20342 from gatorsmile/reportExecution.
(cherry picked from commit 78801881c405de47f7e53eea3e0420dd69593dbd)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit 1069fad41fb6896fef4245e6ae6b5ba36115ad68
Author: gatorsmile <ga...@...>
Date: 2018-01-22T12:32:59Z
[MINOR][SQL][TEST] Test case cleanups for recent PRs
## What changes were proposed in this pull request?
Revert the unneeded test case changes we made in SPARK-23000
Also fixes the test suites that do not call `super.afterAll()` in the local `afterAll`. The `afterAll()` of `TestHiveSingleton` actually reset the environments.
## How was this patch tested?
N/A
Author: gatorsmile <ga...@gmail.com>
Closes #20341 from gatorsmile/testRelated.
(cherry picked from commit 896e45af5fea264683b1d7d20a1711f33908a06f)
Signed-off-by: gatorsmile <ga...@gmail.com>
commit d963ba031748711ec7847ad0b702911eb7319c63
Author: Wenchen Fan <we...@...>
Date: 2018-01-22T12:56:38Z
[SPARK-23090][SQL] polish ColumnVector
## What changes were proposed in this pull request?
Several improvements:
* provide a default implementation for the batch get methods
* rename `getChildColumn` to `getChild`, which is more concise
* remove `getStruct(int, int)`, it's only used to simplify the codegen, which is an internal thing, we should not add a public API for this purpose.
## How was this patch tested?
existing tests
Author: Wenchen Fan <we...@databricks.com>
Closes #20277 from cloud-fan/column-vector.
(cherry picked from commit 5d680cae486c77cdb12dbe9e043710e49e8d51e4)
Signed-off-by: Wenchen Fan <we...@databricks.com>
commit 4e75b0cb4b575d4799c02455eed286fe971c6c50
Author: Sandor Murakozi <sm...@...>
Date: 2018-01-22T18:36:28Z
[SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming apps
## What changes were proposed in this pull request?
The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered.
To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing.
## How was this patch tested?
Manual testing of the UI, also using the test command reported in SPARK-23121:
./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark
Closes #20287
Author: Sandor Murakozi <sm...@gmail.com>
Closes #20330 from smurakozi/SPARK-23121.
(cherry picked from commit 446948af1d8dbc080a26a6eec6f743d338f1d12b)
Signed-off-by: Marcelo Vanzin <va...@cloudera.com>
commit 489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91
Author: Sameer Agarwal <sa...@...>
Date: 2018-01-22T18:49:08Z
Preparing Spark release v2.3.0-rc2
commit 6facc7fb2333cc61409149e2f896bf84dd085fa3
Author: Sameer Agarwal <sa...@...>
Date: 2018-01-22T18:49:29Z
Preparing development version 2.3.1-SNAPSHOT
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20957: Branch 2.3
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20957
@rameshch16, seems mistakenly open. mind closing this please?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20957: Branch 2.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20957
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20957: Branch 2.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20957
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20957: Branch 2.3
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20957
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org