You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Weichen Xu (JIRA)" <ji...@apache.org> on 2016/06/01 15:55:59 UTC

[jira] [Closed] (SPARK-15212) CSV file reader when read file with first line schema do not filter blank in schema column name

     [ https://issues.apache.org/jira/browse/SPARK-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weichen Xu closed SPARK-15212.
------------------------------
    Resolution: Won't Fix

> CSV file reader when read file with first line schema do not filter blank in schema column name
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15212
>                 URL: https://issues.apache.org/jira/browse/SPARK-15212
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Weichen Xu
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> for example, run the following code in spark-shell,
> val sqlContext = new org.apache.spark.sql.SQLContext(sc);
> var reader = sqlContext.read
> reader.option("header", true)
> var df = reader.csv("file:///diskext/tdata/spark/d1.csv")
> when the csv data file contains:
> ----------------------------------------------------------
> col1, col2,col3,col4,col5
> 1997,Ford,E350,"ac, abs, moon",3000.00
> ....
> ------------------------------------------------------------
> the first line contains schema, the col2 has a blank before it,
> then the generated DataFrame's schema column name contains the blank.
> This may cause potential problem for example
> df.select("col2") 
> can't find the column, must use 
> df.select(" col2") 
> and if register the dataframe as a table, then do query, can't select col2.
> df.registerTempTable("tab1");
> sqlContext.sql("select col2 from tab1"); //will fail
> must add a column name validate when load csv file with schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org