You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/26 14:50:31 UTC

[GitHub] [spark] LuciferYang opened a new pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

LuciferYang opened a new pull request #30518:
URL: https://github.com/apache/spark/pull/30518


   ### What changes were proposed in this pull request?
   There are some differences between Spark CSV, opencsv and commons-csv, the typical case are described in SPARK-33566, When there are both unescaped quotes and unescaped qualifier in value,  the results of parsing are different. 
   
   The reason for the difference is Spark use `STOP_AT_DELIMITER` as default `UnescapedQuoteHandling` to build `CsvParser` and it not configurable.  
   
   On the other hand, opencsv and commons-csv use the parsing mechanism similar to `STOP_AT_CLOSING_QUOTE ` by default.
   
   So this pr make `unescapedQuoteHandling` option configurable to get the same parsing result as opencsv and commons-csv.
    
   ### Why are the changes needed?
   Make unescapedQuoteHandling option configurable when read CSV to make parsing more flexible。
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   - Pass the Jenkins or GitHub Action
   
   - Add a new case similar to that described in SPARK-33566
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531367933



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -727,6 +727,27 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
    * a record can have.</li>
    * <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of characters allowed
    * for any given value being read. By default, it is -1 meaning unlimited length</li>
+   * <li>`unescapedQuoteHandling` (default `STOP_AT_DELIMITER`): defines how the CsvParser

Review comment:
       Address 84c1d59 try to fix python file,  I'm not familiar with Python :(




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734637172






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734625327


   **[Test build #131858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131858/testReport)** for PR 30518 at commit [`ca79a48`](https://github.com/apache/spark/commit/ca79a488929f7777b5c2262c12f85bfa0272ca5d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734681712


   **[Test build #131857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131857/testReport)** for PR 30518 at commit [`1770c56`](https://github.com/apache/spark/commit/1770c565aa573e6f32e404b4f775f2c12edcae2e).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734351642


   **[Test build #131848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131848/testReport)** for PR 30518 at commit [`b025271`](https://github.com/apache/spark/commit/b02527178d38323a239f651506ca609fd963f454).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531383010



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -727,6 +727,27 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
    * a record can have.</li>
    * <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of characters allowed
    * for any given value being read. By default, it is -1 meaning unlimited length</li>
+   * <li>`unescapedQuoteHandling` (default `STOP_AT_DELIMITER`): defines how the CsvParser

Review comment:
       @HyukjinKwon also needs to be update `readwriter.pyi` , and `streaming.pyi`? Is that right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734622708


   **[Test build #131857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131857/testReport)** for PR 30518 at commit [`1770c56`](https://github.com/apache/spark/commit/1770c565aa573e6f32e404b4f775f2c12edcae2e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734683091






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734635304


   **[Test build #131860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131860/testReport)** for PR 30518 at commit [`1646adb`](https://github.com/apache/spark/commit/1646adb6f1d7e6789f30bcc4b94f92caaddb8960).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734449946


   **[Test build #131848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131848/testReport)** for PR 30518 at commit [`b025271`](https://github.com/apache/spark/commit/b02527178d38323a239f651506ca609fd963f454).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734675416


   Thanks @LuciferYang.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734730610






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734663481






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734638204






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734674732


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531363460



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -727,6 +727,27 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
    * a record can have.</li>
    * <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of characters allowed
    * for any given value being read. By default, it is -1 meaning unlimited length</li>
+   * <li>`unescapedQuoteHandling` (default `STOP_AT_DELIMITER`): defines how the CsvParser

Review comment:
       Address 1770c56 fix this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734450478






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734341347


   The original case described in SPARK-33566 as follows:
   
   **data**:
   ```
   "h1","h2","h3"
   "one","two","three"
   "abc","^@<b><i><span style=""font-family: tahoma,sans-serif;"">Referral from Joe Smith.<A0> Fred is hard working.<A0> Super smart, though you wouldn&#39;t know it at first.<A0> 6 months, and we sold this project.<A0> Phooey he said to me!<A0> What&#39;s up with you people.<A0> You&#39;ll say anything for a sale!<A0> Until he met me of course....haar haar!</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Internet is spotty</span></i></b><br><b><i><span style=""font-family: tahoma,sans-serif;"">Working while at home so.<A0> Will be applied this weekend. <A0></span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">On Bill Recovery and 20 yr warranty added.</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Kindness made this deal happen!</span></i></b><br><A0>","xyz"
   ```
   
   **opencsv and commons-csv parse row 2 of h2 as follows:**
   
   ```
   ^@<b><i><span style=""font-family: tahoma,sans-serif;"">Referral from Joe Smith.<A0> Fred is hard working.<A0> Super smart, though you wouldn&#39;t know it at first.<A0> 6 months, and we sold this project.<A0> Phooey he said to me!<A0> What&#39;s up with you people.<A0> You&#39;ll say anything for a sale!<A0> Until he met me of course....haar haar!</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Internet is spotty</span></i></b><br><b><i><span style=""font-family: tahoma,sans-serif;"">Working while at home so.<A0> Will be applied this weekend. <A0></span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">On Bill Recovery and 20 yr warranty added.</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Kindness made this deal happen!</span></i></b><br><A0>
   ```
   
   **Without this pr Spark  parse row 2 of h2 as follows:**
   
   ```
   ^@<b><i><span style=""font-family: tahoma
   ```
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734704289






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734450478






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734693388


   **[Test build #131858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131858/testReport)** for PR 30518 at commit [`ca79a48`](https://github.com/apache/spark/commit/ca79a488929f7777b5c2262c12f85bfa0272ca5d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531363545



##########
File path: sql/core/src/test/resources/test-data/unescaped-quotes-unescaped-delimiter.csv
##########
@@ -0,0 +1,3 @@
+c1,c2
+"a,""b,c","xyz"
+"a,b,c","x""yz"

Review comment:
       Address ca79a48 try to fix this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734388931






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734703671


   **[Test build #131860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131860/testReport)** for PR 30518 at commit [`1646adb`](https://github.com/apache/spark/commit/1646adb6f1d7e6789f30bcc4b94f92caaddb8960).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734341347


   The original case described in SPARK-33566 as follows:
   
   **data**:
   ```
   "h1","h2","h3"
   "one","two","three"
   "abc","^@<b><i><span style=""font-family: tahoma,sans-serif;"">Referral from Joe Smith.<A0> Fred is hard working.<A0> Super smart, though you wouldn&#39;t know it at first.<A0> 6 months, and we sold this project.<A0> Phooey he said to me!<A0> What&#39;s up with you people.<A0> You&#39;ll say anything for a sale!<A0> Until he met me of course....haar haar!</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Internet is spotty</span></i></b><br><b><i><span style=""font-family: tahoma,sans-serif;"">Working while at home so.<A0> Will be applied this weekend. <A0></span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">On Bill Recovery and 20 yr warranty added.</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Kindness made this deal happen!</span></i></b><br><A0>","xyz"
   ```
   
   opencsv and commons-csv parse row2 of h2 as follows:
   
   ```
   ^@<b><i><span style=""font-family: tahoma,sans-serif;"">Referral from Joe Smith.<A0> Fred is hard working.<A0> Super smart, though you wouldn&#39;t know it at first.<A0> 6 months, and we sold this project.<A0> Phooey he said to me!<A0> What&#39;s up with you people.<A0> You&#39;ll say anything for a sale!<A0> Until he met me of course....haar haar!</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Internet is spotty</span></i></b><br><b><i><span style=""font-family: tahoma,sans-serif;"">Working while at home so.<A0> Will be applied this weekend. <A0></span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">On Bill Recovery and 20 yr warranty added.</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Kindness made this deal happen!</span></i></b><br><A0>
   ```
   
   Spark  parse row2 of h2 as follows:
   
   ```
   ^@<b><i><span style=""font-family: tahoma
   ```
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734637172






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734388931






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734663481






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734730610






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734351642


   **[Test build #131848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131848/testReport)** for PR 30518 at commit [`b025271`](https://github.com/apache/spark/commit/b02527178d38323a239f651506ca609fd963f454).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #30518:
URL: https://github.com/apache/spark/pull/30518


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734647519


   **[Test build #131863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131863/testReport)** for PR 30518 at commit [`ca2900d`](https://github.com/apache/spark/commit/ca2900de52497ce5bb3bad90f0f28678248c7595).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734341951


   cc @HyukjinKwon 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734704486






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734647519


   **[Test build #131863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131863/testReport)** for PR 30518 at commit [`ca2900d`](https://github.com/apache/spark/commit/ca2900de52497ce5bb3bad90f0f28678248c7595).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734625327


   **[Test build #131858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131858/testReport)** for PR 30518 at commit [`ca79a48`](https://github.com/apache/spark/commit/ca79a488929f7777b5c2262c12f85bfa0272ca5d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531363545



##########
File path: sql/core/src/test/resources/test-data/unescaped-quotes-unescaped-delimiter.csv
##########
@@ -0,0 +1,3 @@
+c1,c2
+"a,""b,c","xyz"
+"a,b,c","x""yz"

Review comment:
       Address ca79a48 try to fix this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734687799


   thx @HyukjinKwon 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531367654



##########
File path: sql/core/src/test/resources/test-data/unescaped-quotes-unescaped-delimiter.csv
##########
@@ -0,0 +1,3 @@
+c1,c2
+"a,""b,c","xyz"
+"a,b,c","x""yz"

Review comment:
       Address ca79a48 try to fix this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734665677






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734635304


   **[Test build #131860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131860/testReport)** for PR 30518 at commit [`1646adb`](https://github.com/apache/spark/commit/1646adb6f1d7e6789f30bcc4b94f92caaddb8960).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531325319



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
##########
@@ -258,7 +264,7 @@ class CSVOptions(
     settings.setNullValue(nullValue)
     settings.setEmptyValue(emptyValueInRead)
     settings.setMaxCharsPerColumn(maxCharsPerColumn)
-    settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
+    settings.setUnescapedQuoteHandling(unescapedQuoteHandling)

Review comment:
       Seems fine. Can you also update `DataFrameReader.scala`, `DataStreamReader.scala`, `readwriter.py`, and `streaming.py`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734729684


   **[Test build #131863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131863/testReport)** for PR 30518 at commit [`ca2900d`](https://github.com/apache/spark/commit/ca2900de52497ce5bb3bad90f0f28678248c7595).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531326355



##########
File path: sql/core/src/test/resources/test-data/unescaped-quotes-unescaped-delimiter.csv
##########
@@ -0,0 +1,3 @@
+c1,c2
+"a,""b,c","xyz"
+"a,b,c","x""yz"

Review comment:
       Since we're here, can we make the test self-contained instead of relying on the external file? I generally prefer to make the test self-contained to make it easier to read.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734622708


   **[Test build #131857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131857/testReport)** for PR 30518 at commit [`1770c56`](https://github.com/apache/spark/commit/1770c565aa573e6f32e404b4f775f2c12edcae2e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734704289






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734341347


   The original case described in SPARK-33566 as follows:
   
   **data**:
   ```
   "h1","h2","h3"
   "one","two","three"
   "abc","^@<b><i><span style=""font-family: tahoma,sans-serif;"">Referral from Joe Smith.<A0> Fred is hard working.<A0> Super smart, though you wouldn&#39;t know it at first.<A0> 6 months, and we sold this project.<A0> Phooey he said to me!<A0> What&#39;s up with you people.<A0> You&#39;ll say anything for a sale!<A0> Until he met me of course....haar haar!</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Internet is spotty</span></i></b><br><b><i><span style=""font-family: tahoma,sans-serif;"">Working while at home so.<A0> Will be applied this weekend. <A0></span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">On Bill Recovery and 20 yr warranty added.</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Kindness made this deal happen!</span></i></b><br><A0>","xyz"
   ```
   
   **opencsv and commons-csv parse row2 of h2 as follows:**
   
   ```
   ^@<b><i><span style=""font-family: tahoma,sans-serif;"">Referral from Joe Smith.<A0> Fred is hard working.<A0> Super smart, though you wouldn&#39;t know it at first.<A0> 6 months, and we sold this project.<A0> Phooey he said to me!<A0> What&#39;s up with you people.<A0> You&#39;ll say anything for a sale!<A0> Until he met me of course....haar haar!</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Internet is spotty</span></i></b><br><b><i><span style=""font-family: tahoma,sans-serif;"">Working while at home so.<A0> Will be applied this weekend. <A0></span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">On Bill Recovery and 20 yr warranty added.</span></i></b><br><A0><br><b><i><span style=""font-family: tahoma,sans-serif;"">Kindness made this deal happen!</span></i></b><br><A0>
   ```
   
   **Without this pr Spark  parse row2 of h2 as follows:**
   
   ```
   ^@<b><i><span style=""font-family: tahoma
   ```
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30518:
URL: https://github.com/apache/spark/pull/30518#issuecomment-734665677






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30518: [SPARK-33566][SQL] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531325550



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -727,6 +727,27 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
    * a record can have.</li>
    * <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of characters allowed
    * for any given value being read. By default, it is -1 meaning unlimited length</li>
+   * <li>`unescapedQuoteHandling` (default `STOP_AT_DELIMITER`): defines how the CsvParser

Review comment:
       Seems fine. Can you also update `DataStreamReader.scala`, `readwriter.py`, and `streaming.py`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #30518: [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #30518:
URL: https://github.com/apache/spark/pull/30518#discussion_r531363460



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -727,6 +727,27 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
    * a record can have.</li>
    * <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of characters allowed
    * for any given value being read. By default, it is -1 meaning unlimited length</li>
+   * <li>`unescapedQuoteHandling` (default `STOP_AT_DELIMITER`): defines how the CsvParser

Review comment:
       Address 1770c56 try to  fix this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org