You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2020/03/19 15:56:00 UTC
[jira] [Created] (ARROW-8164) [C++][Dataset] Let datasets be
viewable with non-identical schema
Ben Kietzman created ARROW-8164:
-----------------------------------
Summary: [C++][Dataset] Let datasets be viewable with non-identical schema
Key: ARROW-8164
URL: https://issues.apache.org/jira/browse/ARROW-8164
Project: Apache Arrow
Issue Type: Improvement
Components: C++, C++ - Dataset
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
Fix For: 1.0.0
It would be useful to allow some schema unification capability after discovery has completed. For example, if a FileSystemDataset is being wrapped into a UnionDataset with another and their schemas are unifiable then there is no reason we can't create the UnionDataset (rather than emitting an error because the schemas are not identical).
I think this behavior will be most naturally expressed in C++ like so:
{code}
virtual Result<Dataset> Dataset::ReplaceSchema(std::shared_ptr<Schema> schema) const = 0;
{code}
which will raise an error if the provided schema is not unifiable with the current dataset schema.
If this needs to be extended to non trivial projections then this will probably warrant a separate class, {{ProjectedDataset}} or so. Definitely follow up material (if desired)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)