You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Bridget Bevens (JIRA)" <ji...@apache.org> on 2019/01/29 02:26:00 UTC
[jira] [Updated] (DRILL-7001) Documentation - renaming columns name
in csv header
[ https://issues.apache.org/jira/browse/DRILL-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bridget Bevens updated DRILL-7001:
----------------------------------
Labels: doc-impacting (was: )
> Documentation - renaming columns name in csv header
> ---------------------------------------------------
>
> Key: DRILL-7001
> URL: https://issues.apache.org/jira/browse/DRILL-7001
> Project: Apache Drill
> Issue Type: Wish
> Affects Versions: 1.15.0
> Reporter: benj
> Priority: Minor
> Labels: doc-impacting
>
> Don't know how if this is the best place for this request but,
> Some operation are realized that eventually change the name of the column when requesting a csvh file (with header),
> These operations are not documented.
> Although it's possible to read [HeaderBuilder.java|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/HeaderBuilder.java], It will be interesting to create a section in documentation to explain at least the principle of these different cases to avoid stupid problems/difficulties
> List of operations (maybe not exhaustive) :
> * Trim() on CSV column name
> {noformat}
> Name , Age,PoB , Info
> =>
> `Name`, `Age`, `PoB` and `Info`{noformat}
> * Others characters than [a-zA-Z0-9_] are replace by '_' (underscore)
> {noformat}
> Name,Sum$,em@il
> =>
> `Name`,'`Sum_`,`em_il`{noformat}
> * Fieldname starting with '_' (underscore) are prefixed by 'col'
> {noformat}
> _name,_age_,pob_,_col_
> =>
> `col_name`, `col_age_`, `pob_`, `col_col_`{noformat}
> * Fieldname starting with [^a-zA-Z] are prefixed 'col_'
> {noformat}
> 0_name, 1_age,@pob,#other1,'other2'
> =>
> `col_0_name`, `col_1_age`, `col_pob`, `col_other1`, `col_other2_`{noformat}
> * Quotation marks are removed
> * If char is unique
> ** if [a-zA-Z] do nothing
> ** elif [0-9] prefix with col_
> ** else reanme in column_[0-9]+ where [0-9]+ designs the position of the column
> * Duplicate columns names (case insensitive) are suffixed with _[0-9]+ (starting from "_2")
> {noformat}
> 0_name,col_0_name,colx,COLX,colx,colx_2
> =>
> `col_0_name`, `col_0_name_2`, `colx`, `COLX_2`, `colx_3`, `colx_2_2`{noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)