You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/03 12:02:32 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

jorisvandenbossche commented on a change in pull request #10854:
URL: https://github.com/apache/arrow/pull/10854#discussion_r681692838



##########
File path: python/pyarrow/tests/test_table.py
##########
@@ -1685,3 +1685,60 @@ def test_table_select():
     result = table.select(['f2'])
     expected = pa.table([a2], ['f2'])
     assert result.equals(expected)
+
+
+def test_recordbatch_from_pydict():

Review comment:
       Can you move this next to the existing test for table (to keep the similar tests close to each other). With some parametrization, we might also be able to reduce the duplication (the actual test doesn't seem to rely on anything RecordBatch/Table specific)

##########
File path: python/pyarrow/table.pxi
##########
@@ -616,6 +616,53 @@ cdef class RecordBatch(_PandasConvertible):
         self.sp_batch = batch
         self.batch = batch.get()
 
+    @staticmethod
+    def from_pydict(mapping, schema=None, metadata=None):
+        """
+        Construct a RecordBatch from Arrow arrays or columns.
+
+        Parameters
+        ----------
+        mapping : dict or Mapping
+            A mapping of strings to Arrays or Python lists.
+        schema : Schema, default None
+            If not passed, will be inferred from the Mapping values.
+        metadata : dict or Mapping, default None
+            Optional metadata for the schema (if inferred).
+
+        Returns
+        -------
+        RecordBatch
+        """
+        arrays = []
+        if schema is None:
+            names = []
+            for k, v in mapping.items():
+                names.append(k)
+                arrays.append(asarray(v))
+            return RecordBatch.from_arrays(arrays, names, metadata=metadata)
+        elif isinstance(schema, Schema):
+            for field in schema:
+                try:
+                    v = mapping[field.name]
+                except KeyError:
+                    try:
+                        v = mapping[tobytes(field.name)]

Review comment:
       Apparently we do support that in Table.from_pydict (but I agree this doesn't seem needed ..)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org