You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Christopher Auston (Jira)" <ji...@apache.org> on 2022/02/25 16:02:00 UTC

[jira] [Created] (SPARK-38331) csv parser exception when quote and escape are both double-quote and a value is just "," and column pruning enabled

Christopher Auston created SPARK-38331:
------------------------------------------

             Summary: csv parser exception when quote and escape are both double-quote and a value is just "," and column pruning enabled
                 Key: SPARK-38331
                 URL: https://issues.apache.org/jira/browse/SPARK-38331
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 3.2.1, 3.1.2
            Reporter: Christopher Auston


Workaround: disable column pruning.

Example pyspark code (from Databricks):
{noformat}
import pyspark
print(pyspark.version.__version__)# enable column pruning (reset default value)
spark.conf.set('spark.sql.csv.parser.columnPruning.enabled', 'true')dbutils.fs.put(file='/tmp/example.csv', contents='''"col1","b4_comma","comma","col4"
"","",",","x"
''', overwrite=True)df = spark.read.csv(
    path='/tmp/example.csv'
    ,inferSchema=True
    ,header=True
    ,escape='"'
    ,multiLine=True
    ,unescapedQuoteHandling='RAISE_ERROR'
    ,mode='FAILFAST'
    )
ex = None
try:
    df.select(df.col1,df.comma).take(1)
except Exception as e:
    ex = e
    
if ex:
    print('[pruning] Exception is raised if b4_comma is NOT selected')
    
df.select(df.b4_comma, df.comma).take(1)
print('[pruning] No exception if b4_comma is selected')ex = None
try:
    df.count()
except Exception as e:
    ex = e
    
if ex:
    print('[pruning] Exception raised by count')
print('\ndisabling pruning\n')
    
    
# disable column pruning
spark.conf.set('spark.sql.csv.parser.columnPruning.enabled', 'false')
df.select(df.col1,df.comma).take(1)
print('[no prune] No exception if b4_comma is NOT selected') {noformat}
 

Output:
{noformat}
3.1.2
Wrote 47 bytes.
[pruning] Exception is raised if b4_comma is NOT selected
[pruning] No exception if b4_comma is selected
[pruning] Exception raised by count

disabling pruning

[no prune] No exception if b4_comma is NOT selected {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org