You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jjerphan (via GitHub)" <gi...@apache.org> on 2023/05/13 15:15:18 UTC

[GitHub] [arrow] jjerphan opened a new pull request, #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

jjerphan opened a new pull request, #35582:
URL: https://github.com/apache/arrow/pull/35582

   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Addresses GH-34787.
   
   ### What changes are included in this PR?
   
   I think this proposes the minimal viable changes to have the same signature as proposed in GH-34787.
   
   ### Are these changes tested?
   
   Yes, for now a single simple test has been introduced to test the correct behavior of `ChunkedArray.to_numpy` with respect to the addition of `zero_copy_only`.
   
   ### Are there any user-facing changes?
   
   Yes, a minimal description of `zero_copy_only` is provided in this regard.
   
   ### Comments
   
   Looking at the signature of `pa.Array.to_numpy`:
   
   https://github.com/apache/arrow/blob/b211c1896916e3d0bbad1ffc88a30ddb9c90a5f0/python/pyarrow/array.pxi#L1486-L1512
   
   I think `writable=True` can be added as a default-valued parameter to `pa.ChunkedArray.to_numpy`.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1546688520

   :warning: GitHub issue #34787 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #35582:
URL: https://github.com/apache/arrow/pull/35582#discussion_r1243642772


##########
python/pyarrow/tests/test_array.py:
##########
@@ -168,6 +168,28 @@ def test_to_numpy_zero_copy():
     np.testing.assert_array_equal(np_arr, expected)
 
 
+def test_chunked_array_to_numpy_zero_copy():
+    elements = [[2, 2, 4], [4, 5, 100]]
+
+    chunked_arr = pa.chunked_array(elements)
+
+    msg = "zero_copy_only must be False for pyarrow.ChunkedArray.to_numpy"
+
+    with pytest.raises(ValueError, match=msg):
+        chunked_arr.to_numpy(zero_copy_only=True)
+
+    np_arr = chunked_arr.to_numpy()
+
+    chunked_arr = None
+    import gc
+    gc.collect()
+
+    # Ensure base is still valid
+    assert np_arr.base is not None

Review Comment:
   I am not sure this part is needed here, as this is not actually zero copy (the base is still set, because we concatenate on the arrow side, and then still keep this concatenated buffer alive through a capsule as the numpy array's base).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1628501358

   :warning: GitHub issue #34787 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jjerphan commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "jjerphan (via GitHub)" <gi...@apache.org>.
jjerphan commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1612641218

   Thank you for the review, @jorisvandenbossche! I will come back to this PR when I get some bandwidth.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1546686314

   * Closes: #34787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1550090976

   > Should it be done in another PR?
   
   I would personally add it in this PR. WIll change the title to include both changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1546686321

   :warning: GitHub issue #34787 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #35582:
URL: https://github.com/apache/arrow/pull/35582#discussion_r1257882802


##########
python/pyarrow/tests/test_array.py:
##########
@@ -179,6 +179,26 @@ def test_to_numpy_zero_copy():
     np.testing.assert_array_equal(np_arr, expected)
 
 
+def test_chunked_array_to_numpy_zero_copy():
+    elements = [[2, 2, 4], [4, 5, 100]]
+
+    chunked_arr = pa.chunked_array(elements)
+
+    msg = "zero_copy_only must be False for pyarrow.ChunkedArray.to_numpy"
+
+    with pytest.raises(ValueError, match=msg):
+        chunked_arr.to_numpy(zero_copy_only=True)
+
+    np_arr = chunked_arr.to_numpy()
+
+    chunked_arr = None
+    import gc
+    gc.collect()
+

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1646203092

   After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit a485bb7a7e948b2089b935fe7d5ab098a0e615de.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/15246878827) has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1549566500

   Thank you for the contribution @jjerphan!
   
   > I think writable can be added as a default-valued parameter to pa.ChunkedArray.to_numpy.
   
   Yes, I think that would be useful to add 👍 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #35582:
URL: https://github.com/apache/arrow/pull/35582#discussion_r1257882468


##########
python/pyarrow/tests/test_array.py:
##########
@@ -168,6 +168,28 @@ def test_to_numpy_zero_copy():
     np.testing.assert_array_equal(np_arr, expected)
 
 
+def test_chunked_array_to_numpy_zero_copy():
+    elements = [[2, 2, 4], [4, 5, 100]]
+
+    chunked_arr = pa.chunked_array(elements)
+
+    msg = "zero_copy_only must be False for pyarrow.ChunkedArray.to_numpy"
+
+    with pytest.raises(ValueError, match=msg):
+        chunked_arr.to_numpy(zero_copy_only=True)
+
+    np_arr = chunked_arr.to_numpy()
+
+    chunked_arr = None
+    import gc
+    gc.collect()
+
+    # Ensure base is still valid
+    assert np_arr.base is not None

Review Comment:
   I don't think that is needed, it's in any case not specific to this test 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1628502630

   :warning: GitHub issue #34787 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche merged PR #35582:
URL: https://github.com/apache/arrow/pull/35582


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] alexswo commented on a diff in pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "alexswo (via GitHub)" <gi...@apache.org>.
alexswo commented on code in PR #35582:
URL: https://github.com/apache/arrow/pull/35582#discussion_r1220356728


##########
python/pyarrow/table.pxi:
##########
@@ -491,6 +497,11 @@ cdef class ChunkedArray(_PandasConvertible):
             PandasOptions c_options
             object values
 
+        if zero_copy_only:

Review Comment:
   nit, how about putting the `if/raise` statement before the `cdef`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jjerphan commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "jjerphan (via GitHub)" <gi...@apache.org>.
jjerphan commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1549585325

   > Thank you for the contribution @jjerphan!
   > 
   > > I think writable can be added as a default-valued parameter to pa.ChunkedArray.to_numpy.
   > 
   > Yes, I think that would be useful to add +1
   
   Thanks for the feedback, @AlenkaF!
   
   Should it be done in another PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1628455866

   :warning: GitHub issue #34787 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jjerphan commented on a diff in pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "jjerphan (via GitHub)" <gi...@apache.org>.
jjerphan commented on code in PR #35582:
URL: https://github.com/apache/arrow/pull/35582#discussion_r1257453294


##########
python/pyarrow/tests/test_array.py:
##########
@@ -168,6 +168,28 @@ def test_to_numpy_zero_copy():
     np.testing.assert_array_equal(np_arr, expected)
 
 
+def test_chunked_array_to_numpy_zero_copy():
+    elements = [[2, 2, 4], [4, 5, 100]]
+
+    chunked_arr = pa.chunked_array(elements)
+
+    msg = "zero_copy_only must be False for pyarrow.ChunkedArray.to_numpy"
+
+    with pytest.raises(ValueError, match=msg):
+        chunked_arr.to_numpy(zero_copy_only=True)
+
+    np_arr = chunked_arr.to_numpy()
+
+    chunked_arr = None
+    import gc
+    gc.collect()
+
+    # Ensure base is still valid
+    assert np_arr.base is not None

Review Comment:
   OK, so we do not want to test the presence of `np_arr.base` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #35582: GH-34787: [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #35582:
URL: https://github.com/apache/arrow/pull/35582#issuecomment-1628504424

   Thanks @jjerphan! Already merging this, so it can be included in pyarrow 13.0.0. But a follow-up PR to add `writeable` as well is certainly welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org