You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/06/24 15:42:04 UTC

[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20949#discussion_r197643948
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
    @@ -512,6 +512,43 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
         }
       }
     
    +  test("Save csv with custom charset") {
    +    Seq("iso-8859-1", "utf-8", "windows-1250").foreach { encoding =>
    --- End diff --
    
    Could you check the `UTF-16` and `UTF-32` encoding too. The written csv files must contain [BOMs](https://en.wikipedia.org/wiki/Byte_order_mark) for such encodings. I am not sure that Spark CSV datasource is able to read it in per-line mode (`multiLine` is set to `false`). Probably, you need to switch to multLine mode or read the files by Scala's library like in JsonSuite: https://github.com/apache/spark/blob/c7e2742f9bce2fcb7c717df80761939272beff54/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala#L2322-L2338


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org