You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/23 01:18:12 UTC

[GitHub] [spark] HyukjinKwon edited a comment on issue #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

HyukjinKwon edited a comment on issue #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies
URL: https://github.com/apache/spark/pull/27928#issuecomment-602313639
 
 
   @srowen, With this change, we will have to maintain and keep `dev/requirements.txt` up to date. We will have to fix that file to use the latest version from now on. Why don't we try to don't pin first and see if we face related issues by using the latest versions? We can start to pin one by one when we face some issues difficult to fix.
   
   For the dependencies below, we have been testing via Github Actions. So far, I couldn't find any related issues related to the versions.
   
   - `pycodestyle` and `flake8` are Python linters.
   - `mkdocs` is for SQL documentation
   - `sphinx` is for Python documentation.
   
   We shouldn't pin `numpy` to encourage people to test the highest versions. It should ideally be `numpy>=1.7` according to `setup.py`.
   
   - `numpy` is an explicit dependency for ML/MLlib in PySpark.
   
   The dependencies below are release-specific and tricky to test. I suspect it's better for new dev people to test them out and pin the versions later when it's needed?
   
   - `PyGithub` and `Unidecode` are release-specific dependencies.
   - `jira` is for release, JIRA <> PR sync, and PR merge. (Did we test BTW?)
   
   FYI, there look some more occurrences such as `pandas` and `numpy` in PIP sanity check at `dev/run-pip-tests` as well from my cursory look:
   
   ```bash
   conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools
   ```
   
   ```bash
   pip install --upgrade pip wheel numpy
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org