You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:02:22 UTC

[jira] [Updated] (SPARK-24065) Issue with the property IgnoreLeadingWhiteSpace

     [ https://issues.apache.org/jira/browse/SPARK-24065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-24065:
---------------------------------
    Labels: bulk-closed  (was: )

> Issue with the property IgnoreLeadingWhiteSpace
> -----------------------------------------------
>
>                 Key: SPARK-24065
>                 URL: https://issues.apache.org/jira/browse/SPARK-24065
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Shell
>    Affects Versions: 2.2.0
>            Reporter: Varsha Chandrashekar
>            Priority: Major
>              Labels: bulk-closed
>         Attachments: lds1.txt, spark-shell-result.PNG
>
>
> "IgnoreLeadingWhiteSpace" property is not working properly for a corner case, Consider the data below:
> ||"Col1"||"Col2"||"Col3"||
> |   "A"  |   "Mark"   |   "US"   |
> |   "B"   |   "Luke"   |   "UK"   |
> Each cell conatins leadingWhiteSpaces and trailingWhiteSpaces, when i upload the dataset by passing "ignoreTrailingWhiteSpace" as true, the trailing spaces are being trimmed which is right. But, when i pass "ignoreLeadingWhiteSpace" as true it is not trimming the leading spaces.
> The scenario was testes/executed in spark-shell. Refer the result below,
> case 1: scala> var df=spark.read.format("com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",false).option("ignoreTrailingWhiteSpace",false).load("C:\\Users\\vachandrashekar
> Desktop
>  lds1.txt")
>  df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]
> scala> df.show()
>  +-----------+--------++------------
> |Col1|Col2|Col3|
> +-----------+--------++------------
> |  "A"  |  "Mark"  |  "US"  |
> |  "B"  |  "Luke" |  "UK" |
> +-----------+--------++------------
> case 2: scala> var df=spark.read.format("com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",true).option("ignoreTrailingWhiteSpace",false).load("C:\\Users\\vachandrashekar
> Desktop
>  lds1.txt")
>  df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]
> scala> df.show()
>  +------+---++-----
> |Col1|Col2|Col3|
> +------+---++-----
> |    A|Mark|US|
> |    B|   Luke|   UK|
> +------+---++-----
> case 3: scala> var df=spark.read.format("com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",false).option("ignoreTrailingWhiteSpace",true).load("C:\\Users\\vachandrashekar
> Desktop
>  lds1.txt")
>  df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]
> scala> df.show()
>  +--------+-------++---------
> |col1|Col2|Col3|
> +--------+-------++---------
> |    "A"|   "Mark"|   "US"|
> |    "B"|   "Luke"|   "UK"|
> +--------+-------++---------
>  
> Analysis:
> Case 1 : Works fine, with "ignoreLeadingWhiteSpace" and "ignoreTrailingWhiteSpace" as false, the data is previewed as in the file.
>  
> Case 2 : Not working!! with "ignoreLeadingWhiteSpace" as true and "ignoreTrailingWhiteSpace" as false results in trimming trailing white spaces and retains leading white spaces. 
> It does trim leading white space but only for two columns in the first row excluding the first column in that row.
>  
> Case 3 : Works fine, with "ignoreLeadingWhiteSpace" as false and "ignoreTrailingWhiteSpace" as true, only trailing white spaces have been trimmed and leading white spaces are retained.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org