You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/02/02 05:02:28 UTC

[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/11016

    [SPARK-13108][SQL] Validate ascii compatible encodings

    https://issues.apache.org/jira/browse/SPARK-13108
    
    CSV datasource currently does not support non-ascii compatible encodings.
    
    I tested this in Max OS. I converted `cars_iso-8859-1.csv` into `cars_utf-16.csv` as below:
    
    ```bash
    iconv -f iso-8859-1 -t utf-16 < cars_iso-8859-1.csv > cars_utf-16.csv
    ```
    
    and run the codes below:
    
    ```scala
    val cars = "cars_utf-16.csv"
    sqlContext.read
      .format("csv")
      .option("charset", "utf-16")
      .option("delimiter", 'þ')
      .load(cars)
      .show()
    ```
    
    This produces a wrong results below:
    ```
    +----+-----+-----+--------------------+------+
    |year| make|model|             comment|blank�|
    +----+-----+-----+--------------------+------+
    |2012|Tesla|    S|          No comment|     �|
    |   �| null| null|                null|  null|
    |1997| Ford| E350|Go get one now th...|     �|
    |2015|Chevy|Volt�|                null|  null|
    |   �| null| null|                null|  null|
    +----+-----+-----+--------------------+------+
    ```
    (For more details, please check the Jira issue ticket).
    
    So, this PR let this datasource checks the encodings to accept only ascii-compatible encodings..


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-13108-non-ascii-encodings

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11016
    
----
commit d3cddb228f20b738b76c0fd74f6891dc9beaedb8
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-02-02T03:52:20Z

    Validate ascii compatible encodings

commit 9f3735ca31f323fa460fd2f386fcef99725d890b
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-02-02T03:53:33Z

    Remove a heading extra whitespace

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-188587938
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51919/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179712755
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-188587366
  
    **[Test build #51919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51919/consoleFull)** for PR 11016 at commit [`4fa2ed3`](https://github.com/apache/spark/commit/4fa2ed30ff4a560d74b7f4858d30377c2cccfcd6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Support for ascii compatibl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-193656980
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52635/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-190008830
  
    **[Test build #52153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52153/consoleFull)** for PR 11016 at commit [`a6f6023`](https://github.com/apache/spark/commit/a6f60234bc82834490271e57842df0775a0acf8b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-178385441
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179706183
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179712757
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50741/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-188556078
  
    **[Test build #51919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51919/consoleFull)** for PR 11016 at commit [`4fa2ed3`](https://github.com/apache/spark/commit/4fa2ed30ff4a560d74b7f4858d30377c2cccfcd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-190040446
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-190040449
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52157/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Support for ascii compatibl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-193656756
  
    **[Test build #52635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52635/consoleFull)** for PR 11016 at commit [`264a1dc`](https://github.com/apache/spark/commit/264a1dc603164bd264e0c084608f31ffb8ad5f69).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-190020402
  
    **[Test build #52157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52157/consoleFull)** for PR 11016 at commit [`a6f6023`](https://github.com/apache/spark/commit/a6f60234bc82834490271e57842df0775a0acf8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-188539066
  
    @falaki Thanks. Then, I will try to generalize this and then change the title as well with some more commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179683891
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179685723
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Support for ascii compatibl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-193628447
  
    **[Test build #52635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52635/consoleFull)** for PR 11016 at commit [`264a1dc`](https://github.com/apache/spark/commit/264a1dc603164bd264e0c084608f31ffb8ad5f69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-178385311
  
    **[Test build #50533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50533/consoleFull)** for PR 11016 at commit [`9f3735c`](https://github.com/apache/spark/commit/9f3735ca31f323fa460fd2f386fcef99725d890b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-188550859
  
    @falaki Hm.. Do JSON and TEXT data sources support `encoding` option?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179705864
  
    **[Test build #50739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50739/consoleFull)** for PR 11016 at commit [`00fe23c`](https://github.com/apache/spark/commit/00fe23cd1577c1572bef6419785368b9960b7630).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-190040314
  
    **[Test build #52157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52157/consoleFull)** for PR 11016 at commit [`a6f6023`](https://github.com/apache/spark/commit/a6f60234bc82834490271e57842df0775a0acf8b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-178362645
  
    **[Test build #50533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50533/consoleFull)** for PR 11016 at commit [`9f3735c`](https://github.com/apache/spark/commit/9f3735ca31f323fa460fd2f386fcef99725d890b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179712429
  
    **[Test build #50741 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50741/consoleFull)** for PR 11016 at commit [`00fe23c`](https://github.com/apache/spark/commit/00fe23cd1577c1572bef6419785368b9960b7630).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Support for ascii compatibl...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-199118697
  
    I found a similar issue with this, [SPARK-1849](https://issues.apache.org/jira/browse/SPARK-1849). 
    
    I think we might have to do not support non-ascii compatible encodings because it looks this PR will support general encodings but I cannot guarantee it supports all the encodings. I mean, this will support general encodings but there might be some encodings writing a BOM-bits-like header.
    
    Since Spark CSV is already supporting the encoding option, I cannot come up with more than three options below:
    
    - Only CSV data source supports some encodings for backward compatibility but except non-ascii compatible encodings and throws an exception when it is non-ascii compatible encodings.
    
    - CSV data source supports other encodings in this way but there are documentations to mention it does not guarantee all the encodings.
    
    - Supports all the encodings and add the tests for all the encodings (maybe with this [encoding list](https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html) in Java)
    
    @srowen Would you maybe give some feedback please?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-189987360
  
    **[Test build #52153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52153/consoleFull)** for PR 11016 at commit [`a6f6023`](https://github.com/apache/spark/commit/a6f60234bc82834490271e57842df0775a0acf8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179683900
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50738/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179706186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50739/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by falaki <gi...@git.apache.org>.
Github user falaki commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-188440177
  
    I think the approach looks good if it was only an issue for csv. But if you want to fix it in a more general way, it makes more sense to move `EncodingTextInputFormat` to a general location (with its own unit tests) and then use it everywhere: JSON, TEXT, CSV.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-190009124
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52153/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-188587935
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179687871
  
    **[Test build #50741 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50741/consoleFull)** for PR 11016 at commit [`00fe23c`](https://github.com/apache/spark/commit/00fe23cd1577c1572bef6419785368b9960b7630).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Support for ascii compatibl...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon closed the pull request at:

    https://github.com/apache/spark/pull/11016


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Support for ascii compatibl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-193656979
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179678080
  
    cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-178385442
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50533/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-179681915
  
    **[Test build #50739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50739/consoleFull)** for PR 11016 at commit [`00fe23c`](https://github.com/apache/spark/commit/00fe23cd1577c1572bef6419785368b9960b7630).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-190018739
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-190009121
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13108][SQL] Support for ascii compatibl...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11016#issuecomment-204346308
  
    Sorry, this will not work for Windows (`\r\n`). I am closing this. If you intend to just block non-ascii compatible encodings, then I will create a new PR or reopen this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org