You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/05 18:02:20 UTC

[GitHub] [spark] thadeusb commented on a change in pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasource

thadeusb commented on a change in pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasource
URL: https://github.com/apache/spark/pull/23080#discussion_r272690095
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
 ##########
 @@ -192,6 +192,20 @@ class CSVOptions(
    */
   val emptyValueInWrite = emptyValue.getOrElse("\"\"")
 
+  /**
+   * A string between two consecutive JSON records.
+   */
+  val lineSeparator: Option[String] = parameters.get("lineSep").map { sep =>
+    require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
+    require(sep.length == 1, "'lineSep' can contain only 1 character.")
 
 Review comment:
   I currently have a project where we are importing windows newlines CRLF from CSV files. 
   
   I backported these changes but ran into an issue with this check, because to properly parse Windows CSV files I must be able to set "\r\n" for lineSep in the settings.
   
   It appears the reason this require was added is no longer needed as the code for asReaderSettings/asWriterSettings never calls that function anymore.
   
   I was able to remove this assert and now able to import the windows newline CSV files into dataframes properly now.
   
   Another issue I had before this was the very last column would always get a "\r" at the end of the column name, so something like "TEXT" would become "TEXT\r", and therefore we would be unable to query the TEXT column anymore. Setting lineSep to "\r\n" solved this issue as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org