You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "sergun (via GitHub)" <gi...@apache.org> on 2023/11/28 07:35:44 UTC

[I] [Python] How to add one level of nesting to flat table? [arrow]

sergun opened a new issue, #38912:
URL: https://github.com/apache/arrow/issues/38912

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   I have flat pa.Table:
   ```
   table = pa.table({"a": [1, 2, 3], "b": [3, 4, 5]})
   ```
   
   How can I create new table from this one by adding one level of nesting?
   So I want to have a new table with  only one column "c" of type struct with two fields "a" and "b" and keep data from original table.
   
   
   
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] How to add one level of nesting to flat table? [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF closed issue #38912: [Python] How to add one level of nesting to flat table?
URL: https://github.com/apache/arrow/issues/38912


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] How to add one level of nesting to flat table? [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #38912:
URL: https://github.com/apache/arrow/issues/38912#issuecomment-1835766112

   Oh, forgot the [issue already exists](https://github.com/apache/arrow/issues/33500 ) with an open PR! :)https://github.com/apache/arrow/pull/38520


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] How to add one level of nesting to flat table? [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #38912:
URL: https://github.com/apache/arrow/issues/38912#issuecomment-1829532741

   If you can work with record batches I would suggest using `to_struct_array()` method:
   
   ```python
   import pyarrow as pa
   batch = pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": [3, 4, 5]})
   struct_array = batch.to_struct_array()
   batch_result = pa.RecordBatch.from_arrays([struct_array], names=["c"])
   # pyarrow.RecordBatch
   # c: struct<a: int64, b: int64>
   #   child 0, a: int64
   #   child 1, b: int64
   # ----
   # c: -- is_valid: all not null
   # -- child 0 type: int64
   # [1,2,3]
   # -- child 1 type: int64
   # [3,4,5]
   ```
   
   If you need to work with tables then you can do the same for each individual chunk:
   
   ```python
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] How to add one level of nesting to flat table? [arrow]

Posted by "sergun (via GitHub)" <gi...@apache.org>.
sergun commented on issue #38912:
URL: https://github.com/apache/arrow/issues/38912#issuecomment-1829720950

   > If you can work with record batches I would suggest using `to_struct_array()` method:
   > 
   > ```python
   > import pyarrow as pa
   > batch = pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": [3, 4, 5]})
   > struct_array = batch.to_struct_array()
   > batch_result = pa.RecordBatch.from_arrays([struct_array], names=["c"])
   > # pyarrow.RecordBatch
   > # c: struct<a: int64, b: int64>
   > #   child 0, a: int64
   > #   child 1, b: int64
   > # ----
   > # c: -- is_valid: all not null
   > # -- child 0 type: int64
   > # [1,2,3]
   > # -- child 1 type: int64
   > # [3,4,5]
   > ```
   > 
   > If you need to work with tables then you can do the same for each individual chunk:
   > 
   > ```python
   > # I think this should work
   > table = pa.table({"a": [1, 2, 3], "b": [3, 4, 5]})
   > batches = []
   > for b in table.to_batches():
   >     batches.append(pa.RecordBatch.from_arrays([b.to_struct_array()], names=["c"]))
   > table_result = pa.Table.from_batches(batches)
   > # pyarrow.Table
   > # c: struct<a: int64, b: int64>
   > #   child 0, a: int64
   > #   child 1, b: int64
   > # ----
   > # c: [
   > #   -- is_valid: all not null
   > #   -- child 0 type: int64
   > # [1,2,3]
   > #   -- child 1 type: int64
   > # [3,4,5]]
   > ```
   
   Thanks a lot @AlenkaF !
   
   Am I right such transformations Table <-> Batches cost close to zero according:
   https://arrow.apache.org/docs/cpp/tables.html#record-batches
   ?
   
   "However, a table can be converted to and built from a sequence of record batches easily without needing to copy the underlying array buffers. A table can be streamed as an arbitrary number of record batches using a [arrow::TableBatchReader](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow16TableBatchReaderE). Conversely, a logical sequence of record batches can be assembled to form a table using one of the [arrow::Table::FromRecordBatches()](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow5Table17FromRecordBatchesERKNSt6vectorINSt10shared_ptrI11RecordBatchEEEE) factory function overloads."
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] How to add one level of nesting to flat table? [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #38912:
URL: https://github.com/apache/arrow/issues/38912#issuecomment-1830224430

   We might want to add `to_struct_array` to `Table` as well? (returning a ChunkedArray of struct type) To make this a bit more convenient in case of a table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] How to add one level of nesting to flat table? [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #38912:
URL: https://github.com/apache/arrow/issues/38912#issuecomment-1829749578

   `Table` to/from `RecordBatches` transformations are zero-copy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org