You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew (JIRA)" <ji...@apache.org> on 2017/03/20 17:58:42 UTC
[jira] [Updated] (SPARK-20035) Spark 2.0.2 writes empty file if no
record is in the dataset
[ https://issues.apache.org/jira/browse/SPARK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew updated SPARK-20035:
---------------------------
Description:
When there is no record in a dataset, the call to write with the spark-csv creates empty file (i.e. with no title line)
```
dataset.write().format("com.databricks.spark.csv").option("header", "true").save("... file name here ...");
or
dataset.write().option("header", "true").csv("... file name here ...");
```
The same file then cannot be read by using the same format (i.e. spark-csv) since it is empty as below. The same call works if the dataset has at least one record.
```
sqlCtx.read().format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("... file name here ...");
or
sparkSession.read().option("header", "true").option("inferSchema", "true").csv("... file name here ...");
```
This is not right, you should always be able to read the file that you wrote to.
was:
When there is no record in a dataset, the call to write with the spark-csv creates empty file (i.e. with no title line)
```
dataset.write().format("com.databricks.spark.csv").option("header", "true").save("... file name here ...");
```
The same file then cannot be read by using the same format (i.e. spark-csv) since it is empty as below. The same call works if the dataset has at least one record.
```
sqlCtx.read().format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("... file name here ...");
```
This is not right, you should always be able to read the file that you wrote to.
Summary: Spark 2.0.2 writes empty file if no record is in the dataset (was: spark-csv writes empty file if no record is in the dataset)
> Spark 2.0.2 writes empty file if no record is in the dataset
> ------------------------------------------------------------
>
> Key: SPARK-20035
> URL: https://issues.apache.org/jira/browse/SPARK-20035
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 2.0.2
> Environment: Spark 2.0.2
> Linux/Windows
> Reporter: Andrew
>
> When there is no record in a dataset, the call to write with the spark-csv creates empty file (i.e. with no title line)
> ```
> dataset.write().format("com.databricks.spark.csv").option("header", "true").save("... file name here ...");
> or
> dataset.write().option("header", "true").csv("... file name here ...");
> ```
> The same file then cannot be read by using the same format (i.e. spark-csv) since it is empty as below. The same call works if the dataset has at least one record.
> ```
> sqlCtx.read().format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("... file name here ...");
> or
> sparkSession.read().option("header", "true").option("inferSchema", "true").csv("... file name here ...");
> ```
> This is not right, you should always be able to read the file that you wrote to.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org