You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2020/11/10 17:06:00 UTC

[jira] [Commented] (ARROW-8221) [Python][Dataset] Expose schema inference / validation options in the factory

    [ https://issues.apache.org/jira/browse/ARROW-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229361#comment-17229361 ] 

Weston Pace commented on ARROW-8221:
------------------------------------

I think another thing that would be included in this work is the ability to specify columns that exist in some, but not all, of the items in the dataset.  Today if I specify column names I get an error if the first table doesn't contain that column even if the other tables do.

 

> [Python][Dataset] Expose schema inference / validation options in the factory
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-8221
>                 URL: https://issues.apache.org/jira/browse/ARROW-8221
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Assignee: Joris Van den Bossche
>            Priority: Major
>              Labels: dataset
>             Fix For: 3.0.0
>
>
> ARROW-8058 added options related to schema inference / validation for the Dataset factory. We should expose this in Python in the {{dataset(..)}} factory function:
> - Add ability to pass a user-specified schema with a {{schema}} keyword, instead of inferring the schema from (one of) the files (to be passed to the factory finish method)
> - Add {{validate_schema}} option to toggle whether the schema is validated against the actual files or not.
> - Expose in some way the number of fragments to be inspected when inferring or validating the schema. Not sure yet what the best API for this would be. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)