You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/11/05 12:55:00 UTC
[jira] [Resolved] (SPARK-25890) Null rows are ignored with Ctrl-A
as a delimiter when reading a CSV file.
[ https://issues.apache.org/jira/browse/SPARK-25890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-25890.
----------------------------------
Resolution: Cannot Reproduce
Ah, I asked because I was a bit confused if you didn't reproduce or could not reproduce in the master. Resolving this.
> Null rows are ignored with Ctrl-A as a delimiter when reading a CSV file.
> -------------------------------------------------------------------------
>
> Key: SPARK-25890
> URL: https://issues.apache.org/jira/browse/SPARK-25890
> Project: Spark
> Issue Type: Bug
> Components: Spark Shell, SQL
> Affects Versions: 2.3.2
> Reporter: Lakshminarayan Kamath
> Priority: Major
>
> Reading a Ctrl-A delimited CSV file ignores rows with all null values. However a comma delimited CSV file doesn't.
> *Reproduction in spark-shell:*
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val l = List(List(1, 2), List(null,null), List(2,3))
> val datasetSchema = StructType(List(StructField("colA", IntegerType, true), StructField("colB", IntegerType, true)))
> val rdd = sc.parallelize(l).map(item ⇒ Row.fromSeq(item.toSeq))
> val df = spark.createDataFrame(rdd, datasetSchema)
> df.show()
> |colA|colB|
> |1 |2 |
> |null|null|
> |2 |3 | |
> df.write.option("delimiter", "\u0001").option("header", "true").csv("/ctrl-a-separated.csv")
> df.write.option("delimiter", ",").option("header", "true").csv("/comma-separated.csv")
> val commaDf = spark.read.option("header", "true").option("delimiter", ",").csv("/comma-separated.csv")
> commaDf.show
> |colA|colB|
> |1 |2 |
> |2 |3 |
> |null|null|
> val ctrlaDf = spark.read.option("header", "true").option("delimiter", "\u0001").csv("/ctrl-a-separated.csv")
> ctrlaDf.show
> |colA|colB|
> |1 |2 |
> |2 |3 |
>
> As seen above, for Ctrl-A delimited CSV, rows containing only null values are ignored.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org