You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@asterixdb.apache.org by "Michael J. Carey (Jira)" <ji...@apache.org> on 2021/05/13 15:00:00 UTC

[jira] [Commented] (ASTERIXDB-2901) Infer schema from CSV header

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343915#comment-17343915 ] 

Michael J. Carey commented on ASTERIXDB-2901:
---------------------------------------------

One could imagine some different "infer" options - e.g., 

   "infer" = N — look at the header plus the first N rows to infer a schema

   "infer" = ALL — look at the whole file to infer a schema

   "infer" = SAMPLE(N) — pick N rows at random to infer a schema

One could also imagine no-header versions where the field names come from the CREATE and the data types are inferred, though this seems to make less sense (helps save less work).

> Infer schema from CSV header
> ----------------------------
>
>                 Key: ASTERIXDB-2901
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2901
>             Project: Apache AsterixDB
>          Issue Type: New Feature
>            Reporter: Gift Sinthong
>            Priority: Trivial
>
> Creating external datasets from CSV files should be able to infer the attribute names from the file header if present and sample records for the data type. For example, in the create statement there could be an "infer" flag that takes in the number of records to scan like the below statement.
> CREATE EXTERNAL DATASET Employee() USING localfs (("path"="localhost:///employees.csv"), ("format"="delimited-text"), ("delimiter"=","), ("header"=true), ("infer"=10))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)