You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/01 21:43:53 UTC

[GitHub] [beam] AnandInguva commented on a diff in pull request #23894: Add support for converting to/from pyarrow Arrays

AnandInguva commented on code in PR #23894:
URL: https://github.com/apache/beam/pull/23894#discussion_r1010928738


##########
sdks/python/apache_beam/typehints/arrow_type_compatibility.py:
##########
@@ -348,3 +348,37 @@ def _from_serialized_schema(serialized_schema):
   def __reduce__(self):
     return self._from_serialized_schema, (
         self._beam_schema.SerializeToString(), )
+
+
+class PyarrowArrayBatchConverter(BatchConverter):
+  def __init__(self, element_type: type):
+    super().__init__(pa.Array, element_type)
+    self._element_type = element_type
+    beam_fieldtype = typing_to_runner_api(element_type)
+    self._arrow_type = _arrow_type_from_beam_fieldtype(beam_fieldtype)
+
+  @staticmethod
+  @BatchConverter.register
+  def from_typehints(element_type,
+                     batch_type) -> Optional['PyarrowArrayBatchConverter']:
+    if batch_type == pa.Array:
+      return PyarrowArrayBatchConverter(element_type)
+
+    return None
+
+  def produce_batch(self, elements):
+    return pa.array(list(elements), type=self._arrow_type)

Review Comment:
   Out of curiosity, if the user wants to add other flags to the pa.array, such as `size` or `mask` which are optional for pa.array, how would they do it? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org