You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by vishnusram <gi...@git.apache.org> on 2017/11/16 17:49:48 UTC

[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files

Github user vishnusram commented on the issue:

    https://github.com/apache/spark/pull/16976
  
    The wholeFile option doesn't seem to be working.
    **Test file content:**
    "num_col1","txt_col","num_col2"
    10001,"regular string",20001
    10002,"string with
    newline",20002
    
    **Command and result:**
    >>> dfu = sqlContext.read.format('com.databricks.spark.csv').option("header","true").option("inferschema","true").option("delimiter",",").option("quote",'"').option("parserLib","univocity").option("wholeFile","true").load('new_line.csv')
    >>> dfu.show(3,False)
    +--------+--------------+--------+
    |num_col1|txt_col       |num_col2|
    +--------+--------------+--------+
    |10001   |regular string|20001   |
    |10002   |string with   |null    |
    |newline"|20002         |null    |
    +--------+--------------+--------+
    
    **Expected result:**
    +--------+--------------+--------+
    |num_col1|txt_col       |num_col2|
    +--------+--------------+--------+
    |10001   |regular string|20001   |
    |10002   |string with\nnewline||20002   |
    +--------+--------------+--------+
    
    **Spark version used 2.2**
    17/11/16 17:15:37 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 2.2.0
          /_/
    
    Using Python version 2.7.5 (default, May  3 2017 07:55:04)
    SparkSession available as 'spark'.
    
    Please let me know if I am missing something here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org