You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/02 20:16:15 UTC

[GitHub] [arrow] kharoc opened a new pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

kharoc opened a new pull request #10854:
URL: https://github.com/apache/arrow/pull/10854


   Create a from_pydict function in RecordBatch class.
   Create unit test for from_pydict


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10854:
URL: https://github.com/apache/arrow/pull/10854#issuecomment-891304199


   https://issues.apache.org/jira/browse/ARROW-13089


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kharoc commented on a change in pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

Posted by GitBox <gi...@apache.org>.
kharoc commented on a change in pull request #10854:
URL: https://github.com/apache/arrow/pull/10854#discussion_r681932100



##########
File path: python/pyarrow/table.pxi
##########
@@ -616,6 +616,53 @@ cdef class RecordBatch(_PandasConvertible):
         self.sp_batch = batch
         self.batch = batch.get()
 
+    @staticmethod
+    def from_pydict(mapping, schema=None, metadata=None):
+        """
+        Construct a RecordBatch from Arrow arrays or columns.
+
+        Parameters
+        ----------
+        mapping : dict or Mapping
+            A mapping of strings to Arrays or Python lists.
+        schema : Schema, default None
+            If not passed, will be inferred from the Mapping values.
+        metadata : dict or Mapping, default None
+            Optional metadata for the schema (if inferred).
+
+        Returns
+        -------
+        RecordBatch
+        """
+        arrays = []
+        if schema is None:
+            names = []
+            for k, v in mapping.items():
+                names.append(k)
+                arrays.append(asarray(v))
+            return RecordBatch.from_arrays(arrays, names, metadata=metadata)
+        elif isinstance(schema, Schema):
+            for field in schema:
+                try:
+                    v = mapping[field.name]
+                except KeyError:
+                    try:
+                        v = mapping[tobytes(field.name)]

Review comment:
       to bytes function was removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10854:
URL: https://github.com/apache/arrow/pull/10854#discussion_r681672240



##########
File path: python/pyarrow/table.pxi
##########
@@ -616,6 +616,53 @@ cdef class RecordBatch(_PandasConvertible):
         self.sp_batch = batch
         self.batch = batch.get()
 
+    @staticmethod
+    def from_pydict(mapping, schema=None, metadata=None):
+        """
+        Construct a RecordBatch from Arrow arrays or columns.
+
+        Parameters
+        ----------
+        mapping : dict or Mapping
+            A mapping of strings to Arrays or Python lists.
+        schema : Schema, default None
+            If not passed, will be inferred from the Mapping values.
+        metadata : dict or Mapping, default None
+            Optional metadata for the schema (if inferred).
+
+        Returns
+        -------
+        RecordBatch
+        """
+        arrays = []
+        if schema is None:
+            names = []
+            for k, v in mapping.items():
+                names.append(k)
+                arrays.append(asarray(v))
+            return RecordBatch.from_arrays(arrays, names, metadata=metadata)
+        elif isinstance(schema, Schema):
+            for field in schema:
+                try:
+                    v = mapping[field.name]
+                except KeyError:
+                    try:
+                        v = mapping[tobytes(field.name)]

Review comment:
       I'm not sure allowing bytes keys is useful. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10854:
URL: https://github.com/apache/arrow/pull/10854#discussion_r681692838



##########
File path: python/pyarrow/tests/test_table.py
##########
@@ -1685,3 +1685,60 @@ def test_table_select():
     result = table.select(['f2'])
     expected = pa.table([a2], ['f2'])
     assert result.equals(expected)
+
+
+def test_recordbatch_from_pydict():

Review comment:
       Can you move this next to the existing test for table (to keep the similar tests close to each other). With some parametrization, we might also be able to reduce the duplication (the actual test doesn't seem to rely on anything RecordBatch/Table specific)

##########
File path: python/pyarrow/table.pxi
##########
@@ -616,6 +616,53 @@ cdef class RecordBatch(_PandasConvertible):
         self.sp_batch = batch
         self.batch = batch.get()
 
+    @staticmethod
+    def from_pydict(mapping, schema=None, metadata=None):
+        """
+        Construct a RecordBatch from Arrow arrays or columns.
+
+        Parameters
+        ----------
+        mapping : dict or Mapping
+            A mapping of strings to Arrays or Python lists.
+        schema : Schema, default None
+            If not passed, will be inferred from the Mapping values.
+        metadata : dict or Mapping, default None
+            Optional metadata for the schema (if inferred).
+
+        Returns
+        -------
+        RecordBatch
+        """
+        arrays = []
+        if schema is None:
+            names = []
+            for k, v in mapping.items():
+                names.append(k)
+                arrays.append(asarray(v))
+            return RecordBatch.from_arrays(arrays, names, metadata=metadata)
+        elif isinstance(schema, Schema):
+            for field in schema:
+                try:
+                    v = mapping[field.name]
+                except KeyError:
+                    try:
+                        v = mapping[tobytes(field.name)]

Review comment:
       Apparently we do support that in Table.from_pydict (but I agree this doesn't seem needed ..)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10854:
URL: https://github.com/apache/arrow/pull/10854#issuecomment-892031609


   @kharoc can you also take a look at my non-inline comment (https://github.com/apache/arrow/pull/10854#pullrequestreview-721106382) about avoiding duplication between the Table and RecordBatch implementation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz closed pull request #10854: ARROW-13089: [Python] Allow creating RecordBatch from Python dict

Posted by GitBox <gi...@apache.org>.
bkietz closed pull request #10854:
URL: https://github.com/apache/arrow/pull/10854


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kharoc commented on a change in pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

Posted by GitBox <gi...@apache.org>.
kharoc commented on a change in pull request #10854:
URL: https://github.com/apache/arrow/pull/10854#discussion_r681931753



##########
File path: python/pyarrow/tests/test_table.py
##########
@@ -1685,3 +1685,60 @@ def test_table_select():
     result = table.select(['f2'])
     expected = pa.table([a2], ['f2'])
     assert result.equals(expected)
+
+
+def test_recordbatch_from_pydict():

Review comment:
       It's solved. Parametrization was added.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz closed pull request #10854: ARROW-13089: [Python] Allow creating RecordBatch from Python dict

Posted by GitBox <gi...@apache.org>.
bkietz closed pull request #10854:
URL: https://github.com/apache/arrow/pull/10854


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org