You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Carlos M. Casas (JIRA)" <ji...@apache.org> on 2017/06/27 11:57:00 UTC

[jira] [Commented] (SPARK-21226) Save empty dataframe in pyspark prints nothing

    [ https://issues.apache.org/jira/browse/SPARK-21226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064706#comment-16064706 ] 

Carlos M. Casas commented on SPARK-21226:
-----------------------------------------

The error is a different way of writing what apparently are two similar (empty) dataframes.

The problem is that, depending on how we create the empty dataframe, the spark.write.parquet method creates or doesn't create the actual .parquet files. In any case, it writes the _SUCCESS file. If written, they can be read; if not written, they can't be read. Why does this method depend on how we created the dataframe?

Although irrelevant, the exception I see when reading the parquet folder with just the _SUCCESS file is:
pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'



> Save empty dataframe in pyspark prints nothing
> ----------------------------------------------
>
>                 Key: SPARK-21226
>                 URL: https://issues.apache.org/jira/browse/SPARK-21226
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Carlos M. Casas
>            Priority: Minor
>
> I try the following:
> schema = whatever schema you want
> df1 = sqlContext.createDataFrame(sc.emptyRDD(), schema)
> df1.write.parquet("as1")
> and I just get a directory as1 with a _SUCCESS file in it. If I try to read that file, I get an exception.
> On the other hand, if I run:
> schema = whatever schema you want
> df2 = sqlContext.createDataFrame([], schema)
> df2.write.parquet("as2")
> I get a directory as2 with some files on it (representing field type information?). If I try to read it, it works: it read an empty df with the proper schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org