You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by GitBox <gi...@apache.org> on 2020/12/07 09:07:13 UTC

[GitHub] [incubator-sedona] jiayuasu opened a new pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

jiayuasu opened a new pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494


   ## Is this PR related to a proposed Issue?
   
   https://issues.apache.org/jira/browse/SEDONA-7
   
   ## What changes were proposed in this PR?
   
   1. Add a source code converter which can switch the source code between Spark 2.4 and 3.0. Scala does not allow conditional compilation so I have to write a silly preprocessor to convert the source code between 2.4 and 3.0. There are a few function calls incompatible across versions. Run `python3 spark-version-converter.py spark3` or `python3 spark-version-converter.py spark2` to convert the code to the target spark version
   2. Configure the pom.xml to yield the following artifacts. Run the following commands:
   
   ```
   python3 spark-version-converter.py spark2
   mvn clean install -DskipTests -Dscala.compat.version="2.11" -Dscala.version="2.11.8" -Dspark.version="2.4.7" -Dspark.compat.version="2.4"
   ```
   
   Generate
   * sedona-core_2.11-1.0.0-incubator-SNAPSHOT.jar
   * sedona-sql-2.4_2.11-1.0.0-incubator-SNAPSHOT.jar
   * sedona-viz-2.4_2.11-1.0.0-incubator-SNAPSHOT.jar
   * sedona-python-adapter_2.11-1.0.0-incubator-SNAPSHOT.jar
   
   ```
   python3 spark-version-converter.py spark3
   mvn clean install -DskipTests -Dscala.compat.version="2.12" -Dscala.version="2.12.8" -Dspark.version="3.0.1" -Dspark.compat.version="3.0"
   ```
   
   Generate
   * sedona-core_2.12-1.0.0-incubator-SNAPSHOT.jar
   * sedona-sql-3.0_2.12-1.0.0-incubator-SNAPSHOT.jar
   * sedona-viz-3.0_2.12-1.0.0-incubator-SNAPSHOT.jar
   * sedona-python-adapter_2.12-1.0.0-incubator-SNAPSHOT.jar
   
   3. GitHub action will be tested on
   * Scala and Java: Spark 2.4.7 (Scala 2.11, 2.12) and Spark 3.0 (Scala 2.12)
   * Python build: Spark 2.4.7 (Scala 2.11) and Spark 3.0 (Scala 2.12), Python 3.7
   
   ## How was this patch tested?
   
   ## Did this PR include necessary documentation updates?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741636788


   @Imbruced 
   
   Test on PySpark 2.4.7 + Python 3.7 still failed. Please see https://github.com/apache/incubator-sedona/runs/1522711633?check_suite_focus=true
   
   It uses the correct PySpark version 2.4.7, and Spark binary version 2.4.7. I use `pipenv graph` to print out all installed packages. PySpark in Pipfile is also set to `>=2.4.0`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] Imbruced edited a comment on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
Imbruced edited a comment on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741007795


   @jiayuasu I already tested Python 3.7 with Spark 2.4.7 and no failure there. Can you send stack trace ? 
   I think I know what the problem is, maybe with py4J version ? On my local PC when i did not explicitly specify good Python path and python used newer one (from spark 3.0.0) it caused an issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] jiayuasu merged pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
jiayuasu merged pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-740517601


   @Imbruced I also made a few changes on Python Adapter and Python. You can take a look:
   
   1. PythonAdapter uses a Scala API which is compatible for both Scala 2.11 and 2.12
   2. Fix the get version bug in Sedona Python.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] Imbruced commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
Imbruced commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741151686


   Should I rebase to this branch when I will create PR with faster Adapter for python ? Or wait until it will be merged ? It should be ready tomorrow (docs update remains).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741557234


   > Should I rebase to this branch when I will create PR with faster Adapter for python ? Or wait until it will be merged ? It should be ready tomorrow (docs update remains).
   
   Let me try to merge this PR first. I believe it will be done today or tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] jiayuasu edited a comment on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
jiayuasu edited a comment on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-740502328


   @Imbruced 
   
   I have tested Sedona Python on Spark 2.4.7 + Python 3.7. It failed. I also tested it on Spark 3.0.1 + Python 3.7. It passed.
   
   1. Does Seodona Python support Spark 2.4?
   2. If not, what change do you need to support Spark 2.4? Can we simply change the PySpark version in PipFile?
   3. It looks like the GitHub Action CI will also fail on Spark 3.0.1 + Python 3.8 and 3.9. Do we need to add the test on Spark 3.0.1 + Python 3.8 and 3.9 in GitHub Action CI?  If not, you can just ignore this question


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] Imbruced commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
Imbruced commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-742069776


   @jiayuasu Thats maybe bcs of shapely or geopandas. I will take a look if it can be fixed in short amount of time. Also python 3.9 is fresh release, please look at pyspark download statistics
   ![image](https://user-images.githubusercontent.com/22958216/101688858-ec48b900-3a6c-11eb-86b0-511566f074a4.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741675706


   @Imbruced I have successfully made Sedona run on Spark 2.4.7 + Python 3.7. In fact, I am glad that the test failed before.
   
   There was a bug in the root pom.xml (sedona-parent). It packaged a wrong jackson into the compiled Sedona jar. It was introduced by PR https://github.com/apache/incubator-sedona/pull/471
   
   This bug will sometimes cause the Scala / Java / Python Sedona fail in the Spark cluster mode. Once I removed this dependency, all test passed. Now as you can see in the GitHub CI test result, 6 checks have passed.
   
   The only thing left is the test on Spark 3.0.1 + Python 3.9. Based on my initial test https://github.com/apache/incubator-sedona/runs/1521112458  , the error is `OSError: Could not find library geos_c or load any of its variants ['libgeos_c.so.1', 'libgeos_c.so']`  It looks like some of the Sedona Python packages need to be updated.
   
   If you think Spark 3.0.1 + Python 3.9 is something easy to fix, please let me know the solution. If you think this will take some time, I will directly merge this PR and leave Python 3.9 support for the future work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-740502328


   @Imbruced 
   
   I have tested Sedona Python on Spark 2.4.7 + Python 3.7. It failed. I also tested it on Spark 3.0.1 + Python 3.7. It passed.
   
   1. Does Seodona Python support Spark 2.4?
   2. If not, what change do you need to support Spark 2.4? Can we simply change the PySpark version in PipFile?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] Imbruced commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
Imbruced commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741007795


   @jiayuasu I already tested Python 3.7 with Spark 2.4.7 and no failure there. Can you send stack trace ? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] Imbruced commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
Imbruced commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741130020


   @jiayuasu  python 3.8 worked for me. We have to add line in CI script ```pipenv --python 3.8``` to update python version within pipenv env. And that should be enough.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-742123181


   @Imbruced I have fixed the Python 3.9 issue. It turns out that we only need to do `sudo apt-get install libgeos-dev`. Now I will merge the PR. You can go ahead and open a PR for your faster Adapter.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-sedona] Imbruced commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

Posted by GitBox <gi...@apache.org>.
Imbruced commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-740927803


   @jiayuasu I will take a look on spark 2.4 and python, it worked before, probably due to api changes sth went wrong. It is good to have python 3.8 support. 3.9 release is still new if it will not require much work I will add it too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org