You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vishnusram <gi...@git.apache.org> on 2017/11/16 17:49:48 UTC
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user vishnusram commented on the issue:
https://github.com/apache/spark/pull/16976
The wholeFile option doesn't seem to be working.
**Test file content:**
"num_col1","txt_col","num_col2"
10001,"regular string",20001
10002,"string with
newline",20002
**Command and result:**
>>> dfu = sqlContext.read.format('com.databricks.spark.csv').option("header","true").option("inferschema","true").option("delimiter",",").option("quote",'"').option("parserLib","univocity").option("wholeFile","true").load('new_line.csv')
>>> dfu.show(3,False)
+--------+--------------+--------+
|num_col1|txt_col |num_col2|
+--------+--------------+--------+
|10001 |regular string|20001 |
|10002 |string with |null |
|newline"|20002 |null |
+--------+--------------+--------+
**Expected result:**
+--------+--------------+--------+
|num_col1|txt_col |num_col2|
+--------+--------------+--------+
|10001 |regular string|20001 |
|10002 |string with\nnewline||20002 |
+--------+--------------+--------+
**Spark version used 2.2**
17/11/16 17:15:37 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Python version 2.7.5 (default, May 3 2017 07:55:04)
SparkSession available as 'spark'.
Please let me know if I am missing something here
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org