You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:03 UTC

[jira] [Assigned] (ARROW-9748) [C++][Dataset] Remove Selector, ignore_prefixes from FileSystemDatasetFactory

     [ https://issues.apache.org/jira/browse/ARROW-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-9748:
----------------------------------

    Assignee:     (was: Weston Pace)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [C++][Dataset] Remove Selector, ignore_prefixes from FileSystemDatasetFactory
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-9748
>                 URL: https://issues.apache.org/jira/browse/ARROW-9748
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 1.0.0
>            Reporter: Ben Kietzman
>            Priority: Major
>              Labels: dataset
>
> Currently FileSystemDatasetFactory can be constructed with an explicit listing of files or with a {{fs::FileSelector}}. Since the selector does not support sophisticated selection criteria, {{FileSystemFactoryOptions::selector_ignore_prefixes}} to allow users to exclude undesired files such as {{_metadata}} or {{.DS_STORE}}.
> The selector + ignored prefixes mechanism is inflexible with numerous edge cases ( ARROW-9644 ARROW-9573 ). Furthermore, implementing more advanced file selection logic in dataset discovery prevents it from being reused by other consumers of the file system api.
> Remove FileSystemDatasetFactory's constructor-from-selector, optionally adding that functionality directly to {{fs::FileSelector}}. An explicit listing of files for use in construction of a FileSystemDatasetFactory can then be assembled using an {{fs::FileSelector}} and/or other globbing libraries, with arbitrary inclusion logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)