You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Raj (JIRA)" <ji...@apache.org> on 2019/02/04 17:23:00 UTC

[jira] [Comment Edited] (SPARK-26804) Spark sql carries newline char from last csv column when imported

    [ https://issues.apache.org/jira/browse/SPARK-26804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760036#comment-16760036 ] 

Raj edited comment on SPARK-26804 at 2/4/19 5:22 PM:
-----------------------------------------------------

Hi Hyukjin,

    I have attached the sample file to reproduce the same at your end. Also see below commands I use in databricks which will help to recreate the issue. Third command box in this screenshot, you can see the Col3 that has extra character highlighted in blue which appears on double clicking the column.

To analyse further, you can download this as csv file and see the extra character. 

Note: If I remove multiline = true option, the columns works great as the extra char gets removed from my last column but, as my data has values with multi lines, so I need this option to set.

 

!image-2019-02-04-12-09-19-210.png!

[^TestFile.csv]

 

^Hope this helps^

^Thanks,^

^Raj^

 


was (Author: hipruthvi):
Hi Hyukjin,

    I have attached the sample file to reproduce the same at your end. Also see below commands I use in databricks which will help to recreate the issue. Third command box in this screenshot, you can see the Col3 that has extra character highlighted in blue which appears on double clicking the column.

To analyse further, you can download this column names in Excel file and see the extra character. 

Note: If I remove multiline = true option, the columns works great as the extra char gets removed from my last column but, as my data has values with multi lines, so I need this option to set.

 

!image-2019-02-04-12-09-19-210.png!

[^TestFile.csv]

 

^Hope this helps^

^Thanks,^

^Raj^

 

> Spark sql carries newline char from last csv column when imported
> -----------------------------------------------------------------
>
>                 Key: SPARK-26804
>                 URL: https://issues.apache.org/jira/browse/SPARK-26804
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Raj
>            Priority: Major
>         Attachments: TestFile.csv, image-2019-02-04-12-09-19-210.png
>
>
> I am trying to generate external sql tables in DataBricks using Spark sql query. Below is my query. The query reads csv file and creates external table but it carries the newline char while creating the last column. Is there a way to resolve this issue? 
>  
> %sql
> create table if not exists <<My table name>>
> using CSV
> options ("header"="true", "inferschema"="true","multiLine"="true", "escape"='"')
> location <my csv path>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org