You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/01/05 08:39:00 UTC

[jira] [Updated] (ARROW-9748) [C++][Dataset] Remove Selector, ignore_prefixes from FileSystemDatasetFactory

     [ https://issues.apache.org/jira/browse/ARROW-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace updated ARROW-9748:
-------------------------------
    Fix Version/s:     (was: 3.0.0)
                   4.0.0

> [C++][Dataset] Remove Selector, ignore_prefixes from FileSystemDatasetFactory
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-9748
>                 URL: https://issues.apache.org/jira/browse/ARROW-9748
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 1.0.0
>            Reporter: Ben Kietzman
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: dataset
>             Fix For: 4.0.0
>
>
> Currently FileSystemDatasetFactory can be constructed with an explicit listing of files or with a {{fs::FileSelector}}. Since the selector does not support sophisticated selection criteria, {{FileSystemFactoryOptions::selector_ignore_prefixes}} to allow users to exclude undesired files such as {{_metadata}} or {{.DS_STORE}}.
> The selector + ignored prefixes mechanism is inflexible with numerous edge cases ( ARROW-9644 ARROW-9573 ). Furthermore, implementing more advanced file selection logic in dataset discovery prevents it from being reused by other consumers of the file system api.
> Remove FileSystemDatasetFactory's constructor-from-selector, optionally adding that functionality directly to {{fs::FileSelector}}. An explicit listing of files for use in construction of a FileSystemDatasetFactory can then be assembled using an {{fs::FileSelector}} and/or other globbing libraries, with arbitrary inclusion logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)