You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/10/04 08:34:00 UTC

[jira] [Updated] (ARROW-12311) [Python][R] Expose (hide?) ScanOptions

     [ https://issues.apache.org/jira/browse/ARROW-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche updated ARROW-12311:
------------------------------------------
    Fix Version/s: 11.0.0
                       (was: 10.0.0)

> [Python][R] Expose (hide?) ScanOptions
> --------------------------------------
>
>                 Key: ARROW-12311
>                 URL: https://issues.apache.org/jira/browse/ARROW-12311
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python, R
>            Reporter: Weston Pace
>            Priority: Major
>             Fix For: 11.0.0
>
>
> Currently R completely hides the `ScanOptions` class.
> In python the class is exposed but the documentation prefers `dataset.scan` (which hides both the scanner and the scan options).
> However, there is some useful information in the `ScanOptions`.  Specifically, the projected schema (which is a product of the dataset schema and the projection expression and not easily recreated) and the materialized fields (the list of fields referenced by either the filter or the projection) which might be useful for reporting purposes.
> Currently R uses the projected schema to convert a list of column names into a partition schema.  Python does not rely on either field.
>  
> Options:
>  - Keep the status quo
>  - Expose the ScanOptions object (which itself is exposed via the Scanner)
>  - Expose the interesting fields via the Scanner
>  
> Currently the C++ design is halfway between the latter two (projected schema is exposed and options).  My preference would be the third option.  It raises a further question about how to expose the scanner itself in Python?  Should the user be using ScannerBuilder?  Should they use NewScan?  Should they use the scanner directly at all or should it be hidden?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)