You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Weichen Xu (JIRA)" <ji...@apache.org> on 2016/06/01 15:55:59 UTC
[jira] [Closed] (SPARK-15212) CSV file reader when read file with
first line schema do not filter blank in schema column name
[ https://issues.apache.org/jira/browse/SPARK-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weichen Xu closed SPARK-15212.
------------------------------
Resolution: Won't Fix
> CSV file reader when read file with first line schema do not filter blank in schema column name
> -----------------------------------------------------------------------------------------------
>
> Key: SPARK-15212
> URL: https://issues.apache.org/jira/browse/SPARK-15212
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0, 2.1.0
> Reporter: Weichen Xu
> Priority: Minor
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> for example, run the following code in spark-shell,
> val sqlContext = new org.apache.spark.sql.SQLContext(sc);
> var reader = sqlContext.read
> reader.option("header", true)
> var df = reader.csv("file:///diskext/tdata/spark/d1.csv")
> when the csv data file contains:
> ----------------------------------------------------------
> col1, col2,col3,col4,col5
> 1997,Ford,E350,"ac, abs, moon",3000.00
> ....
> ------------------------------------------------------------
> the first line contains schema, the col2 has a blank before it,
> then the generated DataFrame's schema column name contains the blank.
> This may cause potential problem for example
> df.select("col2")
> can't find the column, must use
> df.select(" col2")
> and if register the dataframe as a table, then do query, can't select col2.
> df.registerTempTable("tab1");
> sqlContext.sql("select col2 from tab1"); //will fail
> must add a column name validate when load csv file with schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org