You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/13 11:39:39 UTC

[GitHub] [arrow] rok opened a new pull request, #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

rok opened a new pull request, #13594:
URL: https://github.com/apache/arrow/pull/13594

   This is to resolve [ARROW-17065](https://issues.apache.org/jira/browse/ARROW-17065).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r922054152


##########
python/pyarrow/scalar.pxi:
##########
@@ -902,10 +902,7 @@ cdef class ExtensionScalar(Scalar):
         """
         Return this scalar as a Python object.
         """
-        if self.value is None:
-            return None
-        else:
-            return self.type.scalar_as_py(self.value)
+        return self.value if self.value is not None else None

Review Comment:
   Changed it around a bit, but it's the same logic really.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r920329408


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -229,7 +238,7 @@ def test_ext_array_to_pylist():
     storage = pa.array([b"foo", b"bar", None], type=pa.binary(3))
     arr = pa.ExtensionArray.from_storage(ty, storage)
 
-    assert arr.to_pylist() == [b"foo", b"bar", None]
+    assert [x.as_py() if x else None for x in arr.to_pylist()] == [b"foo", b"bar", None]

Review Comment:
   I think this now wraps the physical scalar into `__arrow_ext_scalar_class__` and hence this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r920329408


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -229,7 +238,7 @@ def test_ext_array_to_pylist():
     storage = pa.array([b"foo", b"bar", None], type=pa.binary(3))
     arr = pa.ExtensionArray.from_storage(ty, storage)
 
-    assert arr.to_pylist() == [b"foo", b"bar", None]
+    assert [x.as_py() if x else None for x in arr.to_pylist()] == [b"foo", b"bar", None]

Review Comment:
   I think this now wraps the physical value into ExtensionScalar and ExtensionScalar into `__arrow_ext_scalar_class__` and hence this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13594:
URL: https://github.com/apache/arrow/pull/13594#issuecomment-1185669316

   FYI @wjones127 @chaubold @sterlinm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r921928410


##########
python/pyarrow/scalar.pxi:
##########
@@ -902,10 +902,7 @@ cdef class ExtensionScalar(Scalar):
         """
         Return this scalar as a Python object.
         """
-        if self.value is None:
-            return None
-        else:
-            return self.type.scalar_as_py(self.value)
+        return self.value if self.value is not None else None

Review Comment:
   ```suggestion
           return self.value.as_py() if self.value is not None else None
   ```
   
   ? (it is a bit suspicious that the tests are not failing because of this)



##########
python/pyarrow/array.pxi:
##########
@@ -2771,6 +2771,11 @@ cdef class ExtensionArray(Array):
         """
         return self.storage.to_numpy(**kwargs)
 
+    cdef getitem(self, int64_t i):
+        scalar = ExtensionScalar.wrap(GetResultValue(self.ap.GetScalar(i)))
+        return self.type.__arrow_ext_scalar_class__().from_storage(
+            self.type, scalar.value)

Review Comment:
   Ah, reading further, I see you already defined a `get_scalar_class_from_type` similarly as how it is done for arrays. In that case I think it should actually be possible to update `Scalar.wrap` as well, and then the above `getitem` override shouldn't be needed?



##########
docs/source/python/extending_types.rst:
##########
@@ -290,21 +290,25 @@ Custom scalar conversion
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 If you want scalars of your custom extension type to convert to a custom type when
-:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.scalar_as_py()`
+:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.as_py()`

Review Comment:
   ```suggestion
   :meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionScalar.as_py()`
   ```



##########
python/pyarrow/array.pxi:
##########
@@ -2771,6 +2771,11 @@ cdef class ExtensionArray(Array):
         """
         return self.storage.to_numpy(**kwargs)
 
+    cdef getitem(self, int64_t i):
+        scalar = ExtensionScalar.wrap(GetResultValue(self.ap.GetScalar(i)))
+        return self.type.__arrow_ext_scalar_class__().from_storage(
+            self.type, scalar.value)

Review Comment:
   The alternative would be to update `Scalar.wrap` / `pyarrow_wrap_scalar` to do this already, similarly how this is done for Array wrapping (if you look for how `__arrow_ext_class__` gets used, you might be able to do something similar, although the scalar wrap code is structured a bit differently)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r920130884


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -229,7 +238,7 @@ def test_ext_array_to_pylist():
     storage = pa.array([b"foo", b"bar", None], type=pa.binary(3))
     arr = pa.ExtensionArray.from_storage(ty, storage)
 
-    assert arr.to_pylist() == [b"foo", b"bar", None]
+    assert [x.as_py() if x else None for x in arr.to_pylist()] == [b"foo", b"bar", None]

Review Comment:
   Yeah. It seems that if `__arrow_ext_scalar_class__` is not defined on the ExtensionType `.as_py()` is not available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r924426693


##########
docs/source/python/extending_types.rst:
##########
@@ -290,21 +290,25 @@ Custom scalar conversion
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 If you want scalars of your custom extension type to convert to a custom type when
-:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.scalar_as_py()`
+:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionScalar.as_py()`

Review Comment:
   Whoops, my last inline comment was not fully resulting in a correct word order -> https://github.com/apache/arrow/pull/13649



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r921998531


##########
python/pyarrow/scalar.pxi:
##########
@@ -902,10 +902,7 @@ cdef class ExtensionScalar(Scalar):
         """
         Return this scalar as a Python object.
         """
-        if self.value is None:
-            return None
-        else:
-            return self.type.scalar_as_py(self.value)
+        return self.value if self.value is not None else None

Review Comment:
   Oh. Yeah, that is suspicious 😬 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13594:
URL: https://github.com/apache/arrow/pull/13594#issuecomment-1187021495

   @jorisvandenbossche I can merge this if you feel it's ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r922052730


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -229,7 +238,7 @@ def test_ext_array_to_pylist():
     storage = pa.array([b"foo", b"bar", None], type=pa.binary(3))
     arr = pa.ExtensionArray.from_storage(ty, storage)
 
-    assert arr.to_pylist() == [b"foo", b"bar", None]
+    assert [x.as_py() if x else None for x in arr.to_pylist()] == [b"foo", b"bar", None]

Review Comment:
   This is resolved now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13594:
URL: https://github.com/apache/arrow/pull/13594#issuecomment-1183114130

   https://issues.apache.org/jira/browse/ARROW-17065


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r920130884


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -229,7 +238,7 @@ def test_ext_array_to_pylist():
     storage = pa.array([b"foo", b"bar", None], type=pa.binary(3))
     arr = pa.ExtensionArray.from_storage(ty, storage)
 
-    assert arr.to_pylist() == [b"foo", b"bar", None]
+    assert [x.as_py() if x else None for x in arr.to_pylist()] == [b"foo", b"bar", None]

Review Comment:
   Yeah. It seems that if `__arrow_ext_scalar_class__` is not defined on the ExtensionType we `.as_py()` is not available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r922053035


##########
python/pyarrow/array.pxi:
##########
@@ -2771,6 +2771,11 @@ cdef class ExtensionArray(Array):
         """
         return self.storage.to_numpy(**kwargs)
 
+    cdef getitem(self, int64_t i):
+        scalar = ExtensionScalar.wrap(GetResultValue(self.ap.GetScalar(i)))
+        return self.type.__arrow_ext_scalar_class__().from_storage(
+            self.type, scalar.value)

Review Comment:
   Done. Please check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r920331188


##########
python/pyarrow/array.pxi:
##########
@@ -2771,6 +2771,11 @@ cdef class ExtensionArray(Array):
         """
         return self.storage.to_numpy(**kwargs)
 
+    cdef getitem(self, int64_t i):
+        scalar = ExtensionScalar.wrap(GetResultValue(self.ap.GetScalar(i)))
+        return self.type.__arrow_ext_scalar_class__().from_storage(
+            self.type, scalar.value)

Review Comment:
   @jorisvandenbossche Does this approach make sense or should we unwrap the value and wrap it into the correct subclassed ExtensionScalar? I'm sorry, I'm not very familiar with the machinery here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13594:
URL: https://github.com/apache/arrow/pull/13594#issuecomment-1183255337

   Looks like tests are failing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13594:
URL: https://github.com/apache/arrow/pull/13594#issuecomment-1189702173

   Benchmark runs are scheduled for baseline = 1cbfed04b1b77fb9a3d86f1c0b0f3dfd4cf21835 and contender = 0b53adc8de0b7fb6fc96ebbe704d928797f28222. 0b53adc8de0b7fb6fc96ebbe704d928797f28222 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/9c38f67feb074984a3c0080cc462e336...d6e8e310a1ce45ee877d2490fbf8c7d9/)
   [Failed :arrow_down:1.39% :arrow_up:0.03%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/d463d005931447de8fb69abea3e37079...beaf63774981465cb5167079818e21bf/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1452ab8e911c4d709dcd482335fdb74a...f26698565e3d4affb8ab4d849f99514f/)
   [Finished :arrow_down:0.25% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f6c4c9684c1c4beb97b791654d9f16df...dacd5d3a36f44938b6e3a28883ed27e3/)
   Buildkite builds:
   [Failed] [`0b53adc8` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1139)
   [Failed] [`0b53adc8` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1144)
   [Failed] [`0b53adc8` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1131)
   [Finished] [`0b53adc8` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1148)
   [Failed] [`1cbfed04` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1138)
   [Failed] [`1cbfed04` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1143)
   [Failed] [`1cbfed04` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1130)
   [Finished] [`1cbfed04` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1147)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r920115637


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -229,7 +238,7 @@ def test_ext_array_to_pylist():
     storage = pa.array([b"foo", b"bar", None], type=pa.binary(3))
     arr = pa.ExtensionArray.from_storage(ty, storage)
 
-    assert arr.to_pylist() == [b"foo", b"bar", None]
+    assert [x.as_py() if x else None for x in arr.to_pylist()] == [b"foo", b"bar", None]

Review Comment:
   Hmm, don't we want the override to also affect `to_pylist`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #13594:
URL: https://github.com/apache/arrow/pull/13594#discussion_r923235597


##########
docs/source/python/extending_types.rst:
##########
@@ -290,21 +290,25 @@ Custom scalar conversion
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 If you want scalars of your custom extension type to convert to a custom type when
-:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionType.scalar_as_py()`
+:meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionScalar.as_py()`

Review Comment:
   ```suggestion
   :meth:`ExtensionScalar.as_py()` is called, you can override the :meth:`ExtensionScalar.as_py()` by subclassing :class:`ExtensionScalar`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm merged pull request #13594: ARROW-17065: [Python] Allow using subclassed ExtensionScalar in ExtensionType

Posted by GitBox <gi...@apache.org>.
lidavidm merged PR #13594:
URL: https://github.com/apache/arrow/pull/13594


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org