You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/14 02:09:34 UTC
[GitHub] [arrow] coady opened a new issue, #33663: [C++][Python] Fully support special fields in `Scanner`.
coady opened a new issue, #33663:
URL: https://github.com/apache/arrow/issues/33663
### Describe the enhancement requested
[Scanner documentation](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html) mentions special fields which can be projected in columns:
* __batch_index
* __fragment_index
* __last_in_fragment
* __filename
But there are limitations. They can't be used with other projections.
```python
dataset.head(10, columns={'bi': ds.field('__batch_index')})
...
ArrowInvalid: No match for FieldRef.Name(__batch_index) in ...
```
Nor reused in subsequent scans.
```python
In []: table = dataset.head(10, columns=['__batch_index'])
In []: table
Out[]:
pyarrow.Table
__batch_index: int32
----
__batch_index: [[0,0,0,0,0,0,0,0,0,0]]
In []: ds.dataset(table).to_table()
...
ArrowInvalid: Multiple matches for FieldRef.Name(__batch_index) in __batch_index: int32
__fragment_index: int32
__batch_index: int32
__last_in_fragment: bool
__filename: string
```
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] coady commented on issue #33663: [C++][Python] Fully support special fields in `Scanner`.
Posted by GitBox <gi...@apache.org>.
coady commented on issue #33663:
URL: https://github.com/apache/arrow/issues/33663#issuecomment-1384399199
I think having the second scan override is fine, because if a user wanted the original they could have aliased it in the projection. Don't feel strongly about it though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on issue #33663: [C++][Python] Fully support special fields in `Scanner`.
Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #33663:
URL: https://github.com/apache/arrow/issues/33663#issuecomment-1384047176
In the subsequent scan case would you expect the batch index of the second scan to override the batch index column of the table? Or would you be expecting a `__batch_index` and `__batch_index_2` or something like that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org