You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/28 16:37:45 UTC

[GitHub] [arrow] rok opened a new pull request, #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

rok opened a new pull request, #13454:
URL: https://github.com/apache/arrow/pull/13454

   This is to resolve [ARROW-13612](https://issues.apache.org/jira/browse/ARROW-13612).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r914041992


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -261,16 +297,19 @@ def test_ext_scalar_from_array():
         assert s.type == ty1
         if val is not None:
             assert s.value == pa.scalar(val, storage.type)
+            assert s.as_py() == UUID(bytes=val)
         else:
             assert s.value is None
-        assert s.as_py() == val
 
     scalars_b = list(b)
     assert len(scalars_b) == 4
 
     for sa, sb in zip(scalars_a, scalars_b):
         assert sa.is_valid == sb.is_valid
-        assert sa.as_py() == sb.as_py()
+        if sa.as_py() is None:
+            assert sa.as_py() == sb.as_py()
+        else:
+            assert sa.as_py().bytes == sb.as_py()
         assert sa != sb

Review Comment:
   Added one back and introduced `UuidType2()` without `scalar_as_py`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r915076964


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -41,6 +42,17 @@ def __init__(self):
     def __reduce__(self):
         return UuidType, ()
 
+    def scalar_as_py(self, scalar):
+        return UUID(bytes=scalar.as_py())
+
+class UuidType2(pa.PyExtensionType):

Review Comment:
   ```suggestion
   
   class UuidType2(pa.PyExtensionType):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r908770161


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,13 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to. Implementations should handle the case
+        where the scalar argument is None."""

Review Comment:
   ```suggestion
           where the scalar argument is ``None``."""
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1177457010

   Should we revert and consider the alternate approach? It would be consistent to allow ExtensionScalar subclasses (you could imagine overriding `__repr__` or operators or things) though I'm not sure how useful that actually would be.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1180389466

   @rok do you want to file issues for renaming the method, or just implementing the subclass approach and removing the method?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1177478740

   I don't have an opinion on subclassing `ExtensionScalar` vs current approach. Do we risk breaking backwards compatibility if we choose to change this later?
   A private method does sounds better. I can make an incremental PR to change to private.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sterlinm commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
sterlinm commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1179422218

   I don't have a very strong feeling about it. It's nice not to have another class to worry about with the current approach, but I think it would be more intuitive to subclass ExtensionScalar and override as_py.
   
   I don't have a current use case for other aspects of the scalar that I can think of, but that might be a failure of imagination. With the arrays you have different methods for to_numpy and to_pandas, is there anything similar for th scalar types?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1168961277

   @wjones127 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r914041248


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,18 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to.
+
+        Parameters
+        ----------
+        scalar : pyarrow.ExtensionScalar
+          The scalar to be converted to a Python object.
+        """
+        return scalar.as_py() if scalar is not None else None

Review Comment:
   > This should no longer be able to be None right?
   >
   > Also is the type ExtensionScalar? It seems we pass the underlying value right?
   
   Scalar shouldn't be None, but the return value could I think. Changed, please see if it makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r913924665


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,18 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to.
+
+        Parameters
+        ----------
+        scalar : pyarrow.ExtensionScalar
+          The scalar to be converted to a Python object.
+        """
+        return scalar.as_py() if scalar is not None else None

Review Comment:
   This should no longer be able to be None right?
   
   Also is the type ExtensionScalar? It seems we pass the underlying value right?



##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -261,16 +297,19 @@ def test_ext_scalar_from_array():
         assert s.type == ty1
         if val is not None:
             assert s.value == pa.scalar(val, storage.type)
+            assert s.as_py() == UUID(bytes=val)
         else:
             assert s.value is None
-        assert s.as_py() == val
 
     scalars_b = list(b)
     assert len(scalars_b) == 4
 
     for sa, sb in zip(scalars_a, scalars_b):
         assert sa.is_valid == sb.is_valid
-        assert sa.as_py() == sb.as_py()
+        if sa.as_py() is None:
+            assert sa.as_py() == sb.as_py()
+        else:
+            assert sa.as_py().bytes == sb.as_py()
         assert sa != sb

Review Comment:
   Is there still a test of converting a type that uses the default `scalar_as_py`?



##########
docs/source/python/extending_types.rst:
##########
@@ -286,6 +286,33 @@ are available).
 The same ``__arrow_ext_class__`` specialization can be used with custom types defined
 by subclassing :class:`ExtensionType`.
 
+Custom scalar conversion
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want scalars of your custom extension type to convert to a custom type when
+:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.scalar_as_py()`
+method. For example, if we wanted the above example 3D point type to return a custom
+3D point class instead of a list, we would implement::
+
+    Point3D = namedtuple("Point3D", ["x", "y", "z"])
+
+    class Point3DType(pa.PyExtensionType):
+        def __init__(self):
+            pa.PyExtensionType.__init__(self, pa.list_(pa.float32(), 3))
+
+        def __reduce__(self):
+            return Point3DType, ()
+
+        def scalar_as_py(self, scalar):

Review Comment:
   ```suggestion
           def scalar_as_py(self, scalar: ListScalar) -> Point3D:
   ```
   maybe for clarity about types?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1180408870

   Thanks. FWIW I would be in favor of removing the method + adding the subclass if we think we can get it in time for 9.0.0, just to avoid having multiple ways to do something


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r915044913


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,22 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to.
+
+        Parameters
+        ----------
+        scalar : pyarrow.ExtensionScalar
+          The scalar to be converted to a Python object.
+
+        Returns
+        -------
+        Scalar value as a native Python object.
+        """
+        return scalar.as_py()

Review Comment:
   Oh I see, sorry. Is this better?



##########
docs/source/python/extending_types.rst:
##########
@@ -286,6 +286,33 @@ are available).
 The same ``__arrow_ext_class__`` specialization can be used with custom types defined
 by subclassing :class:`ExtensionType`.
 
+Custom scalar conversion
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want scalars of your custom extension type to convert to a custom type when
+:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.scalar_as_py()`
+method. For example, if we wanted the above example 3D point type to return a custom
+3D point class instead of a list, we would implement::
+
+    Point3D = namedtuple("Point3D", ["x", "y", "z"])
+
+    class Point3DType(pa.PyExtensionType):
+        def __init__(self):
+            pa.PyExtensionType.__init__(self, pa.list_(pa.float32(), 3))
+
+        def __reduce__(self):
+            return Point3DType, ()
+
+        def scalar_as_py(self, scalar: ListScalar) -> Point3D:
+            return Point3D(*scalar.as_py())
+
+Arrays built using this extension type now provide scalars that convert to our ``Point3D`` class::
+
+    >>> storage = pa.array([[1, 2, 3], [4, 5, 6]], pa.list_(pa.float32(), 3))
+    >>> arr = pa.ExtensionArray.from_storage(Point3DType(), storage)
+    >>> arr[0].as_py()
+    Point3D(x=1.0, y=2.0, z=3.0)
+

Review Comment:
   Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1178078119

   Renaming the method sounds good then. We can revisit this if there's demand for custom scalar subclasses


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm merged pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
lidavidm merged PR #13454:
URL: https://github.com/apache/arrow/pull/13454


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1175327332

   Thanks for the review @lidavidm!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r914141335


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,22 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to.
+
+        Parameters
+        ----------
+        scalar : pyarrow.ExtensionScalar
+          The scalar to be converted to a Python object.
+
+        Returns
+        -------
+        Scalar value as a native Python object.
+        """
+        return scalar.as_py()

Review Comment:
   I was talking about the docstring - this is always passed a not-None `pyarrow.Scalar` subclass corresponding to the extension type, right? 



##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,18 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to.
+
+        Parameters
+        ----------
+        scalar : pyarrow.ExtensionScalar
+          The scalar to be converted to a Python object.
+        """
+        return scalar.as_py() if scalar is not None else None

Review Comment:
   What I meant was, this method is always passed the Scalar of the storage type, right? Not of the Extension type? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] chaubold commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
chaubold commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1177154061

   @jorisvandenbossche that sounds to me like the question whether it will ever make sense that an `ExtensionScalar` has a state. If it does, then it might make sense to give the user full control by providing `ExtensionScalar` subclasses. However, if the assumption is that a `Scalar` doesn't have a state on its own (which is at least my understanding), then having the conversion method `scalar_as_py()` in the `ExtensionType` and using that whenever accessing the elements of an `ExtensionArray` with that `ExtensionType` sounds good enough to me (looking at it from a user perspective).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1177140141

   One small comment: this method should maybe be not "public" on the type? (there should be no reason for a _user_ of the extension type (not implementor) to ever call this method on the type object) 
   So we could also follow the similar pattern as the other non-public methods we already have that can be overridden and call this something like `__arrow_ext_scalar_as_py__` ?
   
   ---
   
   Something that I didn't think about when opening the JIRA originally, but a potentially different way to enable basically the same ability to override `as_py()` would be to allow the user to sublcass `ExtensionScalar`, and then specify this class (in the same way as you can sublcass `ExtensionArray`, and specify this in `ExtensionType.__arrow_ext_class__`). 
   That way, you have full control over that scalar class (and for example also add custom attributes), and thus also can override the default `as_py()` method.
   
   I am not really sure this is a better way, though (it's a bit more generic, but the question is whether this will be needed in practice).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r908876758


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,13 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to. Implementations should handle the case
+        where the scalar argument is ``None``."""

Review Comment:
   ```suggestion
           where the scalar argument is ``None``.
           
           Parameters
           ----------
           scalar : pyarrow.ExtensionScalar or None
             The scalar to be converted to a Python object.
           """
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r908769271


##########
docs/source/python/extending_types.rst:
##########
@@ -286,6 +286,36 @@ are available).
 The same ``__arrow_ext_class__`` specialization can be used with custom types defined
 by subclassing :class:`ExtensionType`.
 
+Custom scalar conversion
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want scalars of your custom extension type to convert to a custom type when
+:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.scalar_as_py()`
+method. For example, if we wanted the above example 3D point type to return a tuple instead
+of a list, we would implement::
+
+    Point3D = namedtuple("Point3D", ["x", "y", "z"])
+
+    class Point3DType(pa.PyExtensionType):
+        def __init__(self):
+            pa.PyExtensionType.__init__(self, pa.list_(pa.float32(), 3))
+
+        def __reduce__(self):
+            return Point3DType, ()
+
+        def scalar_as_py(self, scalar):
+            if scalar is None:
+                return None
+            else:
+                return Point3D(*scalar.as_py())
+
+Arrays built using this extension scalar type now have the expected scalar behaviour::

Review Comment:
   ```suggestion
   Arrays built using this extension type now provide scalars that convert to our ``Point3D`` class::
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] chaubold commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
chaubold commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r909411136


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,19 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to. Implementations should handle the case
+        where the scalar argument is ``None``.

Review Comment:
   I am curious about your intention to leave the `None` handling to the subclasses.
   
   Would you like to allow returning custom values (along the lines of `numpy.NaN`) in the case that the value is `None`? Or should `None` always stay `None` because Arrow actually is more explicit with its missing value handling and is able to differentiate between e.g. `None` and `NaN` for double columns. If the latter is preferred, then I'd suggest to move the `None` handling to `Scalar.as_py()` and only invoke `scalar_as_py` for existing values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1176597221

   Thanks @lidavidm @wjones127 @chaubold !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r915044913


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,22 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to.
+
+        Parameters
+        ----------
+        scalar : pyarrow.ExtensionScalar
+          The scalar to be converted to a Python object.
+
+        Returns
+        -------
+        Scalar value as a native Python object.
+        """
+        return scalar.as_py()

Review Comment:
   Oh I see, sorry. Is [this](https://github.com/apache/arrow/pull/13454/files#diff-9f6b1034bb57bace094700a1b4748c88e40c9f98aec94a959ca466b5cbce0401) better?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r908768502


##########
docs/source/python/extending_types.rst:
##########
@@ -286,6 +286,36 @@ are available).
 The same ``__arrow_ext_class__`` specialization can be used with custom types defined
 by subclassing :class:`ExtensionType`.
 
+Custom scalar conversion
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want scalars of your custom extension type to convert to a custom type when
+:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.scalar_as_py()`
+method. For example, if we wanted the above example 3D point type to return a tuple instead
+of a list, we would implement::

Review Comment:
   ```suggestion
   method. For example, if we wanted the above example 3D point type to return a custom
   3D point class instead of a list, we would implement::
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1168996131

   https://issues.apache.org/jira/browse/ARROW-13612


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1179398889

   @sterlinm would you prefer the current approach (adding `scalar_as_py` method to `ExtensionType`) or subclassing `ExtensionScalar`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r914942510


##########
docs/source/python/extending_types.rst:
##########
@@ -286,6 +286,33 @@ are available).
 The same ``__arrow_ext_class__`` specialization can be used with custom types defined
 by subclassing :class:`ExtensionType`.
 
+Custom scalar conversion
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want scalars of your custom extension type to convert to a custom type when
+:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.scalar_as_py()`
+method. For example, if we wanted the above example 3D point type to return a custom
+3D point class instead of a list, we would implement::
+
+    Point3D = namedtuple("Point3D", ["x", "y", "z"])
+
+    class Point3DType(pa.PyExtensionType):
+        def __init__(self):
+            pa.PyExtensionType.__init__(self, pa.list_(pa.float32(), 3))
+
+        def __reduce__(self):
+            return Point3DType, ()
+
+        def scalar_as_py(self, scalar: ListScalar) -> Point3D:
+            return Point3D(*scalar.as_py())
+
+Arrays built using this extension type now provide scalars that convert to our ``Point3D`` class::
+
+    >>> storage = pa.array([[1, 2, 3], [4, 5, 6]], pa.list_(pa.float32(), 3))
+    >>> arr = pa.ExtensionArray.from_storage(Point3DType(), storage)
+    >>> arr[0].as_py()
+    Point3D(x=1.0, y=2.0, z=3.0)
+

Review Comment:
   Maybe it might be compelling to also show that `as_py_list()` returns `List[Point3D]` too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13454:
URL: https://github.com/apache/arrow/pull/13454#discussion_r910325599


##########
python/pyarrow/types.pxi:
##########
@@ -902,6 +902,19 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionArray
 
+    def scalar_as_py(self, scalar):
+        """Convert scalar to a Python type.
+
+        This method can be overridden in subclasses to customize what type
+        scalars are converted to. Implementations should handle the case
+        where the scalar argument is ``None``.

Review Comment:
   I think for numpy conversion, we have more efficient methods to do that conversion (to arrays). I can't think of any other case where someone might want to customize the null handling of `as_py()`, so maybe you are right that we should only invoke `scalar_as_py` for existing values.
   
   @jorisvandenbossche Any thoughts on that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1169045373

   @jorisvandenbossche you're probably best suited to review this/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1177603279

   > that sounds to me like the question whether it will ever make sense that an `ExtensionScalar` has a state
   
   Indeed, as you mention, a scalar object will not have state (since it is just created (when needed) from the actual array). But I think there are other reasons one _could_ want to add things to the scalar class. Similarly as for example StructScalar has struct-specific methods, one could have custom methods or properties to make it easier to access certain information on the extension scalar (in addition to customizing the repr, as David mentioned)
   
   > It would be consistent to allow ExtensionScalar subclasses (you could imagine overriding `__repr__` or operators or things) though I'm not sure how useful that actually would be.
   
   Indeed, `__repr__` is another example that could be overridden, but I am also not sure how often this would be needed / useful. I suppose for debugging purposes that could actually be useful to have a custom scalar class / repr.
   
   I don't think it is needed to revert this, we can do a follow-up PR adjusting the approach if there is any agreement (eg the tests are still useful, as those should still pass (after adapting the actual tested extension subclass)). 
   
   I would definitely make the method private (if we keep the approach of the method on the type class), and maybe move towards making ExtensionScalar subclassable.
   
   > Do we risk breaking backwards compatibility if we choose to change this later?
   
   We could add the subclassable ExtensionScalar class later without breaking compatibility, only at that point it would give two ways to achieve the same thing specifically for controlling `as_py()`, which is not fully ideal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sterlinm commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
sterlinm commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1179376971

   > We can revisit this if there's demand for custom scalar subclasses
   
   For what it's worth, I've been struggling with figuring this out in my own arrow-backed ExtensionArray and am excited to see that there might be a more natural solution. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13454: ARROW-13612: [Python] Allow specifying a custom type for converting ExtensionScalar to python object

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13454:
URL: https://github.com/apache/arrow/pull/13454#issuecomment-1180393059

   > @rok do you want to file issues for renaming the method, or just implementing the subclass approach and removing the method?
   
   I was thinking to file a follow-up issue and open a subclass and a rename issue and have a discussion about which to go with. I think I'll do it later today, just need to address some bureaucracy first. Sorry I didn't do it sooner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org