You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:02:22 UTC
[jira] [Updated] (SPARK-24065) Issue with the property
IgnoreLeadingWhiteSpace
[ https://issues.apache.org/jira/browse/SPARK-24065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-24065:
---------------------------------
Labels: bulk-closed (was: )
> Issue with the property IgnoreLeadingWhiteSpace
> -----------------------------------------------
>
> Key: SPARK-24065
> URL: https://issues.apache.org/jira/browse/SPARK-24065
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, Spark Shell
> Affects Versions: 2.2.0
> Reporter: Varsha Chandrashekar
> Priority: Major
> Labels: bulk-closed
> Attachments: lds1.txt, spark-shell-result.PNG
>
>
> "IgnoreLeadingWhiteSpace" property is not working properly for a corner case, Consider the data below:
> ||"Col1"||"Col2"||"Col3"||
> | "A" | "Mark" | "US" |
> | "B" | "Luke" | "UK" |
> Each cell conatins leadingWhiteSpaces and trailingWhiteSpaces, when i upload the dataset by passing "ignoreTrailingWhiteSpace" as true, the trailing spaces are being trimmed which is right. But, when i pass "ignoreLeadingWhiteSpace" as true it is not trimming the leading spaces.
> The scenario was testes/executed in spark-shell. Refer the result below,
> case 1: scala> var df=spark.read.format("com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",false).option("ignoreTrailingWhiteSpace",false).load("C:\\Users\\vachandrashekar
> Desktop
> lds1.txt")
> df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]
> scala> df.show()
> +-----------+--------++------------
> |Col1|Col2|Col3|
> +-----------+--------++------------
> | "A" | "Mark" | "US" |
> | "B" | "Luke" | "UK" |
> +-----------+--------++------------
> case 2: scala> var df=spark.read.format("com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",true).option("ignoreTrailingWhiteSpace",false).load("C:\\Users\\vachandrashekar
> Desktop
> lds1.txt")
> df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]
> scala> df.show()
> +------+---++-----
> |Col1|Col2|Col3|
> +------+---++-----
> | A|Mark|US|
> | B| Luke| UK|
> +------+---++-----
> case 3: scala> var df=spark.read.format("com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",false).option("ignoreTrailingWhiteSpace",true).load("C:\\Users\\vachandrashekar
> Desktop
> lds1.txt")
> df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]
> scala> df.show()
> +--------+-------++---------
> |col1|Col2|Col3|
> +--------+-------++---------
> | "A"| "Mark"| "US"|
> | "B"| "Luke"| "UK"|
> +--------+-------++---------
>
> Analysis:
> Case 1 : Works fine, with "ignoreLeadingWhiteSpace" and "ignoreTrailingWhiteSpace" as false, the data is previewed as in the file.
>
> Case 2 : Not working!! with "ignoreLeadingWhiteSpace" as true and "ignoreTrailingWhiteSpace" as false results in trimming trailing white spaces and retains leading white spaces.
> It does trim leading white space but only for two columns in the first row excluding the first column in that row.
>
> Case 3 : Works fine, with "ignoreLeadingWhiteSpace" as false and "ignoreTrailingWhiteSpace" as true, only trailing white spaces have been trimmed and leading white spaces are retained.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org