You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Petr Novak <os...@gmail.com> on 2015/08/25 17:02:55 UTC

DataFrame Parquet Writer doesn't keep schema

Hi all,
when I read parquet files with "required" fields aka nullable=false they
are read correctly. Then I save them (df.write.parquet) and read again all
my fields are saved and read as optional, aka nullable=true. Which means I
suddenly have files with incompatible schemas. This happens on 1.3.0-1.4.1
and even on 1.5.1-rc1.

Should I set some write option to keep nullability? Is there a specific
reason why nullability is always overriden to true?

Many thanks,
Peter

Re: DataFrame Parquet Writer doesn't keep schema

Posted by Petr Novak <os...@gmail.com>.

The same as
https://mail.google.com/mail/#label/Spark%2Fuser/14f64c75c15f5ccd

Please follow the discussion there.

On Tue, Aug 25, 2015 at 5:02 PM, Petr Novak <os...@gmail.com> wrote:

> Hi all,
> when I read parquet files with "required" fields aka nullable=false they
> are read correctly. Then I save them (df.write.parquet) and read again all
> my fields are saved and read as optional, aka nullable=true. Which means I
> suddenly have files with incompatible schemas. This happens on 1.3.0-1.4.1
> and even on 1.5.1-rc1.
>
> Should I set some write option to keep nullability? Is there a specific
> reason why nullability is always overriden to true?
>
> Many thanks,
> Peter
>
>
>