You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/14 02:09:34 UTC

[GitHub] [arrow] coady opened a new issue, #33663: [C++][Python] Fully support special fields in `Scanner`.

coady opened a new issue, #33663:
URL: https://github.com/apache/arrow/issues/33663

   ### Describe the enhancement requested
   
   [Scanner documentation](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html) mentions special fields which can be projected in columns:
   * __batch_index
   * __fragment_index
   * __last_in_fragment
   *  __filename
   
   But there are limitations. They can't be used with other projections.
   ```python
   dataset.head(10, columns={'bi': ds.field('__batch_index')})
   ...
   ArrowInvalid: No match for FieldRef.Name(__batch_index) in ...
   ```
   
   Nor reused in subsequent scans.
   ```python
   In []: table = dataset.head(10, columns=['__batch_index'])
   
   In []: table
   Out[]: 
   pyarrow.Table
   __batch_index: int32
   ----
   __batch_index: [[0,0,0,0,0,0,0,0,0,0]]
   
   In []: ds.dataset(table).to_table()
   ...
   ArrowInvalid: Multiple matches for FieldRef.Name(__batch_index) in __batch_index: int32
   __fragment_index: int32
   __batch_index: int32
   __last_in_fragment: bool
   __filename: string
   ```
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coady commented on issue #33663: [C++][Python] Fully support special fields in `Scanner`.

Posted by GitBox <gi...@apache.org>.
coady commented on issue #33663:
URL: https://github.com/apache/arrow/issues/33663#issuecomment-1384399199

   I think having the second scan override is fine, because if a user wanted the original they could have aliased it in the projection. Don't feel strongly about it though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #33663: [C++][Python] Fully support special fields in `Scanner`.

Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #33663:
URL: https://github.com/apache/arrow/issues/33663#issuecomment-1384047176

   In the subsequent scan case would you expect the batch index of the second scan to override the batch index column of the table?  Or would you be expecting a `__batch_index` and `__batch_index_2` or something like that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org