You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ra...@apache.org on 2024/02/22 10:19:32 UTC

(arrow) branch maint-15.0.x updated (d74ab501fc -> 1c41fcae84)

This is an automated email from the ASF dual-hosted git repository.

raulcd pushed a change to branch maint-15.0.x
in repository https://gitbox.apache.org/repos/asf/arrow.git


    from d74ab501fc GH-40112: [CI][Python] Ensure CPython is selected, not PyPy (#40131)
     new 8b6e07ca1b GH-39933: [R] Fix pointer conversion to Python for latest reticulate (#39969)
     new 8c6959070e GH-39942: [Python] Make capsule name check more lenient (#39977)
     new 1c41fcae84 GH-40174: [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175)

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 cpp/src/parquet/column_writer_test.cc | 20 +++++++++++++-------
 python/pyarrow/types.pxi              | 24 ++++++++++++++++++++++--
 r/R/python.R                          | 16 +++++-----------
 3 files changed, 40 insertions(+), 20 deletions(-)


(arrow) 01/03: GH-39933: [R] Fix pointer conversion to Python for latest reticulate (#39969)

Posted by ra...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

raulcd pushed a commit to branch maint-15.0.x
in repository https://gitbox.apache.org/repos/asf/arrow.git

commit 8b6e07ca1b7631f062255b89bb7531aec932c7d2
Author: Dewey Dunnington <de...@voltrondata.com>
AuthorDate: Wed Feb 7 10:29:46 2024 -0400

    GH-39933: [R] Fix pointer conversion to Python for latest reticulate (#39969)
    
    
    ### Rationale for this change
    
    The integration tests and documentation build is failing
    
    ### What changes are included in this PR?
    
    Instead of relying on how reticulate converts an R external pointer, use a Python integer instead. We can't use an R integer (because they're only 32 bits); we can't use an R double (because the static cast to/from uintptr_t is a bit iffy); however, we can use Python to convert a string to Python integer. This is probably how I should have written it the first time but it didn't occur to me at the time.
    
    ### Are these changes tested?
    
    Yes, covered by existing tests.
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #39933
    
    Lead-authored-by: Dewey Dunnington <de...@fishandwhistle.net>
    Co-authored-by: Dewey Dunnington <de...@voltrondata.com>
    Signed-off-by: Dewey Dunnington <de...@voltrondata.com>
---
 r/R/python.R | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/r/R/python.R b/r/R/python.R
index 023d914f16..1159806bf7 100644
--- a/r/R/python.R
+++ b/r/R/python.R
@@ -339,15 +339,9 @@ install_pyarrow <- function(envname = NULL, nightly = FALSE, ...) {
 }
 
 pyarrow_compatible_pointer <- function(ptr) {
-  pa <- reticulate::import("pyarrow")
-  version_string <- pa$`__version__`
-  # remove trailing .devXXX because it won't work with package_version()
-  pyarrow_version <- package_version(gsub("\\.dev.*?$", "", version_string))
-
-  # pyarrow pointers changed in version 7.0.0
-  if (pyarrow_version >= "7.0.0") {
-    return(ptr)
-  } else {
-    return(external_pointer_addr_double(ptr))
-  }
+  # GH-39933: Workaround because there is no built-in way to send a
+  # 64-bit integer to Python from an R object
+  py <- reticulate::import_builtins(convert = FALSE)
+  addr <- external_pointer_addr_character(ptr)
+  py$int(addr)
 }


(arrow) 03/03: GH-40174: [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175)

Posted by ra...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

raulcd pushed a commit to branch maint-15.0.x
in repository https://gitbox.apache.org/repos/asf/arrow.git

commit 1c41fcae842303f52c4c80e14cd6f03b9d98e46d
Author: mwish <ma...@gmail.com>
AuthorDate: Thu Feb 22 00:54:04 2024 +0800

    GH-40174: [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175)
    
    
    
    ### Rationale for this change
    
    Remove `ThrowsMessage` for CI build.
    
    ### What changes are included in this PR?
    
    Remove `ThrowsMessage` for CI build.
    
    ### Are these changes tested?
    
    no need
    
    ### Are there any user-facing changes?
    
    no
    
    * Closes: #40174
    
    Authored-by: mwish <ma...@gmail.com>
    Signed-off-by: Antoine Pitrou <an...@python.org>
---
 cpp/src/parquet/column_writer_test.cc | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/cpp/src/parquet/column_writer_test.cc b/cpp/src/parquet/column_writer_test.cc
index 97421629d2..94b06a3600 100644
--- a/cpp/src/parquet/column_writer_test.cc
+++ b/cpp/src/parquet/column_writer_test.cc
@@ -483,7 +483,6 @@ using TestByteArrayValuesWriter = TestPrimitiveWriter<ByteArrayType>;
 using TestFixedLengthByteArrayValuesWriter = TestPrimitiveWriter<FLBAType>;
 
 using ::testing::HasSubstr;
-using ::testing::ThrowsMessage;
 
 TYPED_TEST(TestPrimitiveWriter, RequiredPlain) {
   this->TestRequiredWithEncoding(Encoding::PLAIN);
@@ -918,20 +917,27 @@ TEST(TestPageWriter, ThrowsOnPagesTooLarge) {
   DataPageV1 over_compressed_limit(buffer, /*num_values=*/100, Encoding::BIT_PACKED,
                                    Encoding::BIT_PACKED, Encoding::BIT_PACKED,
                                    /*uncompressed_size=*/100);
-  EXPECT_THAT([&]() { pager->WriteDataPage(over_compressed_limit); },
-              ThrowsMessage<ParquetException>(HasSubstr("overflows INT32_MAX")));
+  EXPECT_THROW_THAT([&]() { pager->WriteDataPage(over_compressed_limit); },
+                    ParquetException,
+                    ::testing::Property(&ParquetException::what,
+                                        ::testing::HasSubstr("overflows INT32_MAX")));
   DictionaryPage dictionary_over_compressed_limit(buffer, /*num_values=*/100,
                                                   Encoding::PLAIN);
-  EXPECT_THAT([&]() { pager->WriteDictionaryPage(dictionary_over_compressed_limit); },
-              ThrowsMessage<ParquetException>(HasSubstr("overflows INT32_MAX")));
+  EXPECT_THROW_THAT(
+      [&]() { pager->WriteDictionaryPage(dictionary_over_compressed_limit); },
+      ParquetException,
+      ::testing::Property(&ParquetException::what,
+                          ::testing::HasSubstr("overflows INT32_MAX")));
 
   buffer = std::make_shared<Buffer>(&data, 1);
   DataPageV1 over_uncompressed_limit(
       buffer, /*num_values=*/100, Encoding::BIT_PACKED, Encoding::BIT_PACKED,
       Encoding::BIT_PACKED,
       /*uncompressed_size=*/std::numeric_limits<int32_t>::max() + int64_t{1});
-  EXPECT_THAT([&]() { pager->WriteDataPage(over_compressed_limit); },
-              ThrowsMessage<ParquetException>(HasSubstr("overflows INT32_MAX")));
+  EXPECT_THROW_THAT([&]() { pager->WriteDataPage(over_compressed_limit); },
+                    ParquetException,
+                    ::testing::Property(&ParquetException::what,
+                                        ::testing::HasSubstr("overflows INT32_MAX")));
 }
 
 TEST(TestColumnWriter, RepeatedListsUpdateSpacedBug) {


(arrow) 02/03: GH-39942: [Python] Make capsule name check more lenient (#39977)

Posted by ra...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

raulcd pushed a commit to branch maint-15.0.x
in repository https://gitbox.apache.org/repos/asf/arrow.git

commit 8c6959070e8147b56507007adb26fa358114702f
Author: Dewey Dunnington <de...@fishandwhistle.net>
AuthorDate: Fri Feb 9 09:45:45 2024 -0400

    GH-39942: [Python] Make capsule name check more lenient (#39977)
    
    ### Rationale for this change
    
    While #39969 fixed the immediate issue caused by the update of the capsule name used by reticulate whilst converting an R "external pointer", it will still result in an error if somebody is using an older version of the Arrow R package.
    
    ### What changes are included in this PR?
    
    The pyarrow Cython code was modified to accept capsules with the name NULL or "r_extptr".
    
    ### Are these changes tested?
    
    Not sure where the best place for this is, but:
    
    CRAN arrow + released pyarrow + new reticulate (errors):
    
    ``` r
    library(arrow, warn.conflicts = FALSE)
    reticulate::use_virtualenv("~/Desktop/rscratch/arrow/.venv")
    packageVersion("arrow")
    #> [1] '14.0.0.2'
    packageVersion("reticulate")
    #> [1] '1.35.0'
    pa <- reticulate::import("pyarrow")
    pa[["__version__"]]
    #> [1] "15.0.0"
    
    reticulate::r_to_py(arrow::int32())
    #> PyCapsule_GetPointer called with incorrect name
    ```
    
    CRAN arrow + pyarrow from this PR + old reticulate:
    
    ``` r
    library(arrow, warn.conflicts = FALSE)
    reticulate::use_virtualenv("~/Desktop/rscratch/arrow/.venv")
    packageVersion("arrow")
    #> [1] '14.0.0.2'
    packageVersion("reticulate")
    #> [1] '1.34.0'
    pa <- reticulate::import("pyarrow")
    pa[["__version__"]]
    #> [1] "16.0.0.dev92+geafcff7a5"
    
    reticulate::r_to_py(arrow::int32())
    #> DataType(int32)
    ```
    
    CRAN arrow + pyarrow from this PR + new reticulate:
    
    ``` r
    library(arrow, warn.conflicts = FALSE)
    reticulate::use_virtualenv("~/Desktop/rscratch/arrow/.venv")
    packageVersion("arrow")
    #> [1] '14.0.0.2'
    packageVersion("reticulate")
    #> [1] '1.35.0'
    pa <- reticulate::import("pyarrow")
    pa[["__version__"]]
    #> [1] "16.0.0.dev92+geafcff7a5"
    
    reticulate::r_to_py(arrow::int32())
    #> DataType(int32)
    ```
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #39942
    
    Lead-authored-by: Dewey Dunnington <de...@fishandwhistle.net>
    Co-authored-by: Dewey Dunnington <de...@voltrondata.com>
    Co-authored-by: Antoine Pitrou <pi...@free.fr>
    Co-authored-by: Joris Van den Bossche <jo...@gmail.com>
    Signed-off-by: Dewey Dunnington <de...@voltrondata.com>
---
 python/pyarrow/types.pxi | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/python/pyarrow/types.pxi b/python/pyarrow/types.pxi
index b6dc53d633..e2a9446ce9 100644
--- a/python/pyarrow/types.pxi
+++ b/python/pyarrow/types.pxi
@@ -15,7 +15,13 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from cpython.pycapsule cimport PyCapsule_CheckExact, PyCapsule_GetPointer, PyCapsule_New, PyCapsule_IsValid
+from cpython.pycapsule cimport (
+    PyCapsule_CheckExact,
+    PyCapsule_GetPointer,
+    PyCapsule_GetName,
+    PyCapsule_New,
+    PyCapsule_IsValid
+)
 
 import atexit
 from collections.abc import Mapping
@@ -105,6 +111,7 @@ cdef void* _as_c_pointer(v, allow_null=False) except *:
     (the latter for compatibility with raw pointers exported by reticulate)
     """
     cdef void* c_ptr
+    cdef const char* capsule_name
     if isinstance(v, int):
         c_ptr = <void*> <uintptr_t > v
     elif isinstance(v, float):
@@ -114,7 +121,20 @@ cdef void* _as_c_pointer(v, allow_null=False) except *:
             "Arrow library", UserWarning, stacklevel=2)
         c_ptr = <void*> <uintptr_t > v
     elif PyCapsule_CheckExact(v):
-        c_ptr = PyCapsule_GetPointer(v, NULL)
+        # An R external pointer was how the R bindings passed pointer values to
+        # Python from versions 7 to 15 (inclusive); however, the reticulate 1.35.0
+        # update changed the name of the capsule from NULL to "r_extptr".
+        # Newer versions of the R package pass a Python integer; however, this
+        # workaround ensures that old versions of the R package continue to work
+        # with newer versions of pyarrow.
+        capsule_name = PyCapsule_GetName(v)
+        if capsule_name == NULL or capsule_name == b"r_extptr":
+            c_ptr = PyCapsule_GetPointer(v, capsule_name)
+        else:
+            capsule_name_str = capsule_name.decode()
+            raise ValueError(
+                f"Can't convert PyCapsule with name '{capsule_name_str}' to pointer address"
+            )
     else:
         raise TypeError(f"Expected a pointer value, got {type(v)!r}")
     if not allow_null and c_ptr == NULL: