You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/25 08:43:33 UTC

[GitHub] [arrow] AlenkaF opened a new pull request, #14493: ARROW-17360: [Python] Reorder columns in pyarrow.feather.read_table

AlenkaF opened a new pull request, #14493:
URL: https://github.com/apache/arrow/pull/14493

   Before this PR:
   ```python
   table = pa.table({"a": [1, 2, 3], "b": ["a", "b", "c"]})
   orc.write_table(table, 'example.orc')
   orc.read_table('example.orc', columns=['b', 'a'])
   # pyarrow.Table
   # a: int64
   # b: string
   # ----
   # a: [[1,2,3]]
   # b: [["a","b","c"]]
   ```
   
   After this PR:
   ```python
   table = pa.table({"a": [1, 2, 3], "b": ["a", "b", "c"]})
   orc.write_table(table, 'example.orc')
   orc.read_table('example.orc', columns=['b', 'a'])
   # pyarrow.Table
   # b: string
   # a: int64
   # ----
   # b: [["a","b","c"]]
   # a: [[1,2,3]]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #14493: ARROW-17360: [Python] Reorder columns in pyarrow.feather.read_table

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #14493:
URL: https://github.com/apache/arrow/pull/14493#issuecomment-1293139545

   Hmm, the failures are actually related here, and I am not directly sure how to solve this .. We allow nested columns to be selected using a "dotted path", but that doesn't work for `select()`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #14493: ARROW-17360: [Python] Reorder columns in pyarrow.feather.read_table

Posted by GitBox <gi...@apache.org>.
AlenkaF commented on PR #14493:
URL: https://github.com/apache/arrow/pull/14493#issuecomment-1293348987

   Yeah, that's unfortunate. `select()` with "dotted path" doesn't work for `pyarrow.Table` but works for `ORCF.read()`:
   ```python
   >       result4 = orc_file.read(columns=["struct.middle.inner"])
   
   opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_orc.py:584: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
   opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/orc.py:189: in read
       table = table.select(columns)
   pyarrow/table.pxi:3053: in pyarrow.lib.Table.select
       ???
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
   
   >   ???
   E   KeyError: 'Field "struct.middle.inner" does not exist in table schema'
   ```
   
   Due to that the easy solution for reordering the columns isn't feasible anymore. Will close this PR and make another one, where I will add information to the docstrings that in `orc.read_table()` we always follow the order of the file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF closed pull request #14493: ARROW-17360: [Python] Reorder columns in pyarrow.feather.read_table

Posted by GitBox <gi...@apache.org>.
AlenkaF closed pull request #14493: ARROW-17360: [Python] Reorder columns in pyarrow.feather.read_table
URL: https://github.com/apache/arrow/pull/14493


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14493: ARROW-17360: [Python] Reorder columns in pyarrow.feather.read_table

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14493:
URL: https://github.com/apache/arrow/pull/14493#issuecomment-1290329038

   https://issues.apache.org/jira/browse/ARROW-17360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org