You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2021/01/20 03:10:05 UTC
[spark] branch branch-3.1 updated: [SPARK-34162][DOCS][PYSPARK] Add
PyArrow compatibility note for Python 3.9
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.1 by this push:
new 91c9d1c [SPARK-34162][DOCS][PYSPARK] Add PyArrow compatibility note for Python 3.9
91c9d1c is described below
commit 91c9d1c49d693186b7bf4fce986a395b1d217be7
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Tue Jan 19 19:09:14 2021 -0800
[SPARK-34162][DOCS][PYSPARK] Add PyArrow compatibility note for Python 3.9
### What changes were proposed in this pull request?
This PR aims to add a note for Apache Arrow project's `PyArrow` compatibility for Python 3.9.
### Why are the changes needed?
Although Apache Spark documentation claims `Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+.`,
Apache Arrow's `PyArrow` is not compatible with Python 3.9.x yet. Without installing `PyArrow` library, PySpark UTs passed without any problem. So, it would be enough to add a note for this limitation and the compatibility link of Apache Arrow website.
- https://arrow.apache.org/docs/python/install.html#python-compatibility
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
**BEFORE**
<img width="804" alt="Screen Shot 2021-01-19 at 1 45 07 PM" src="https://user-images.githubusercontent.com/9700541/105096867-8fbdbe00-5a5c-11eb-88f7-8caae2427583.png">
**AFTER**
<img width="908" alt="Screen Shot 2021-01-19 at 7 06 41 PM" src="https://user-images.githubusercontent.com/9700541/105121661-85fe7f80-5a89-11eb-8af7-1b37e12c55c1.png">
Closes #31251 from dongjoon-hyun/SPARK-34162.
Authored-by: Dongjoon Hyun <dh...@apple.com>
Signed-off-by: Dongjoon Hyun <dh...@apple.com>
(cherry picked from commit 7e1651e3152c1af59393cb1ca16701fc2500e181)
Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
docs/index.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/docs/index.md b/docs/index.md
index c4c2d72..84f760f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -50,6 +50,7 @@ For the Scala API, Spark {{site.SPARK_VERSION}}
uses Scala {{site.SCALA_BINARY_VERSION}}. You will need to use a compatible Scala version
({{site.SCALA_BINARY_VERSION}}.x).
+For Python 3.9, Arrow optimization and pandas UDFs might not work due to the supported Python versions in Apache Arrow. Please refer to the latest [Python Compatibility](https://arrow.apache.org/docs/python/install.html#python-compatibility) page.
For Java 11, `-Dio.netty.tryReflectionSetAccessible=true` is required additionally for Apache Arrow library. This prevents `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available` when Apache Arrow uses Netty internally.
# Running the Examples and Shell
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org