You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/30 21:06:56 UTC

[GitHub] [arrow-datafusion] wjones127 opened a new pull request #1711: Add tests and CI for optional pyarrow module

wjones127 opened a new pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711


   # Which issue does this PR close?
   
   Closes #1635.
   
    # Rationale for this change
   
   The build was reported as broken, so we should test it.
   
   In addition, datafusion-contrib/datafusion-python#21 will benefit from the completed implementation of roundtrip conversion.
   
   # What changes are included in this PR?
   
    * Adds a new CI job to tests pyarrow module.
    * Completes `to_pyarrow` implementation for `ScalarValue`
    * Adds unit tests to `pyarrow` conversion logic.
   
   # Are there any user-facing changes?
   
   `ScalarValue.to_pyarrow` will no longer throw a not implemented error. Does that count as an API change?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] wjones127 commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1028616639


   > So it seems like maybe pyo3 is picking up the right executable (my virtual env) but not the correct libdir 🤔
   
   @alamb Actually this seems to be a known issue: https://github.com/PyO3/pyo3/issues/1741#issuecomment-958198336. Python doesn't seem to do the virtual environment path computation for embedded systems on Mac OS. A workaround is to extract that into `PYTHON_PATH` like so:
   
   ```
   export PYTHONPATH=$(python -c "import sys; print(sys.path[-1])")
   ```
   
   This is orthogonal to the original issue I had, which is that you must use a framework based Python install. That's default in Homebrew and Python.org installers, but not in pyenv.
   
   I have updated the error help to reflect these two issues. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1025446533


   @wjones127 looks like the newly added CI job is failing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1029101866


   Sorry -- I missed this one -- I just started CI again. When that passes I'll plan to merge it in. Thanks again @wjones127 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1026230116


   I also tried
   
   ```shell
   PYO3_PRINT_CONFIG=1 PYO3_PYTHON=$(which python) cargo test --features=pyarrow -p datafusion -- pyarrow
   error: failed to run custom build command for `pyo3 v0.15.1`
   
   Caused by:
     process didn't exit successfully: `/Users/alamb/Software/arrow-datafusion/target/debug/build/pyo3-0d2dc5a0a08d5a5d/build-script-build` (exit status: 101)
     --- stdout
     cargo:rerun-if-env-changed=PYO3_CROSS
     cargo:rerun-if-env-changed=PYO3_CROSS_LIB_DIR
     cargo:rerun-if-env-changed=PYO3_CROSS_PYTHON_VERSION
     cargo:rerun-if-env-changed=PYO3_PRINT_CONFIG
   
     -- PYO3_PRINT_CONFIG=1 is set, printing configuration and halting compile --
     implementation=CPython
     version=3.9
     shared=true
     abi3=false
     lib_name=python3.9
     lib_dir=/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib
     executable=/Users/alamb/Software/virtual_envs/arrow_dev/bin/python
     pointer_width=64
     build_flags=WITH_THREAD
     suppress_build_script_link_lines=false
   ```
   
   So it seems like maybe pyo3 is picking up the right executable (my virtual env) but not the correct libdir 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1026230116


   I also 
   
   ```shell
   PYO3_PRINT_CONFIG=1 PYO3_PYTHON=$(which python) cargo test --features=pyarrow -p datafusion -- pyarrow
   error: failed to run custom build command for `pyo3 v0.15.1`
   
   Caused by:
     process didn't exit successfully: `/Users/alamb/Software/arrow-datafusion/target/debug/build/pyo3-0d2dc5a0a08d5a5d/build-script-build` (exit status: 101)
     --- stdout
     cargo:rerun-if-env-changed=PYO3_CROSS
     cargo:rerun-if-env-changed=PYO3_CROSS_LIB_DIR
     cargo:rerun-if-env-changed=PYO3_CROSS_PYTHON_VERSION
     cargo:rerun-if-env-changed=PYO3_PRINT_CONFIG
   
     -- PYO3_PRINT_CONFIG=1 is set, printing configuration and halting compile --
     implementation=CPython
     version=3.9
     shared=true
     abi3=false
     lib_name=python3.9
     lib_dir=/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib
     executable=/Users/alamb/Software/virtual_envs/arrow_dev/bin/python
     pointer_width=64
     build_flags=WITH_THREAD
     suppress_build_script_link_lines=false
   ```
   
   So it seems like maybe pyo3 is picking up the right executable (my virtual env) but not the correct libdir 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] wjones127 commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1025259032


   cc @alamb @kszucs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#discussion_r796020420



##########
File path: datafusion/src/pyarrow.rs
##########
@@ -65,3 +69,78 @@ impl<'a> IntoPy<PyObject> for ScalarValue {
         self.to_pyarrow(py).unwrap()
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use pyo3::prepare_freethreaded_python;
+    use pyo3::py_run;
+    use pyo3::types::PyDict;
+    use pyo3::Python;
+
+    fn init_python() {
+        prepare_freethreaded_python();
+        Python::with_gil(|py| {
+            if let Err(err) = py.run("import pyarrow", None, None) {
+                let locals = PyDict::new(py);
+                py.run(
+                    "import sys; executable = sys.executable; python_path = sys.path",
+                    None,
+                    Some(locals),
+                )
+                .expect("Couldn't get python info");
+                let executable: String =
+                    locals.get_item("executable").unwrap().extract().unwrap();
+                let python_path: Vec<&str> =
+                    locals.get_item("python_path").unwrap().extract().unwrap();
+
+                Err(err).expect(
+                    format!(

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] wjones127 commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1026216365


   > Any thoughts @wjones127 ?
   
   TBH, I get this too on my local. It seems to be using your homebrew Python 3.9 but not the version of Python that you've installed pyarrow in. (On my local, I was trying to use a pyenv-installed Python, but it didn't seem to want to use that.) To get it to pass, I set `PYO3_PYTHON` to the path of my Homebrew-installed Python (basically gave up on my Pyenv installs).
   
   At the very least, it seems to work in CI without much configuration.
   
   I assumed this is specific to my Mac OS + PyEnv setup, and that `PYO3_PYTHON=$(which python)` would cover other cases. But on second thought, maybe in all cases where that command would find the right Python, PyO3 will already do the right thing. So perhaps I need to change the instructions? Something like `Try setting PYO3_PYTHON=<full-path-to-python>`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] wjones127 commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1026245714


   > So it seems like maybe pyo3 is picking up the right executable (my virtual env) but not the correct libdir 🤔
   
   Have you checked that your virtualenv has the python3.9.lib file? I think some installations just have the Python binary and no libraries.
   
   Otherwise, I do think there is something on MacOS about shared libraries vs the Frameworks folder libraries that I don't yet comprehend, but might be key here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1026164249


   Thank you for the contribution @wjones127  ❤️ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] wjones127 commented on pull request #1711: Add tests and CI for optional pyarrow module

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #1711:
URL: https://github.com/apache/arrow-datafusion/pull/1711#issuecomment-1025981760


   > @wjones127 looks like the newly added CI job is failing.
   
   @houqp Thanks. I got it passing now in my branch: https://github.com/wjones127/arrow-datafusion/runs/5008841849?check_suite_focus=true


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org