You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Sourabh Badhya <sb...@cloudera.com.INVALID> on 2022/05/09 11:46:19 UTC

Behaviour of Append & Overwrite modes when table is not present when using df.write in Spark 3

Hi team,

I would like to know the behaviour of Append & Overwrite modes when table
is not present and whether automatic table creation is
supported/unsupported when df.write is used in Spark 3 when the underlying
custom datasource is using SupportCatalogOptions.

As per my knowledge, in the current implementation in master,  df.write in
Append and Overwrite mode tries to load the table and look for the schema
of the table. Hence the table is expected to be created beforehand and
hence the table will not automatically be created. Attaching the below code
link for reference.

Code link -
https://github.com/apache/spark/blob/b065c945fe27dd5869b39bfeaad8e2b23a8835b5/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L287

This is slightly different from the behaviour we observed in Spark 2 -
https://lists.apache.org/thread/y468ngqhfhxhv0fygvwvy8r3g4sw9v7n

Please confirm if I am correct and if this is the intended behaviour in
Spark 3.

Thanks and regards,
Sourabh Badhya

Re: Behaviour of Append & Overwrite modes when table is not present when using df.write in Spark 3

Posted by Sourabh Badhya <sb...@cloudera.com.INVALID>.

Requesting some suggestions on this.

Thanks in advance,
Sourabh Badhya

On Mon, May 9, 2022 at 5:16 PM Sourabh Badhya <sb...@cloudera.com> wrote:

> Hi team,
>
> I would like to know the behaviour of Append & Overwrite modes when table
> is not present and whether automatic table creation is
> supported/unsupported when df.write is used in Spark 3 when the underlying
> custom datasource is using SupportCatalogOptions.
>
> As per my knowledge, in the current implementation in master,  df.write in
> Append and Overwrite mode tries to load the table and look for the schema
> of the table. Hence the table is expected to be created beforehand and
> hence the table will not automatically be created. Attaching the below code
> link for reference.
>
> Code link -
> https://github.com/apache/spark/blob/b065c945fe27dd5869b39bfeaad8e2b23a8835b5/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L287
>
> This is slightly different from the behaviour we observed in Spark 2 -
> https://lists.apache.org/thread/y468ngqhfhxhv0fygvwvy8r3g4sw9v7n
>
> Please confirm if I am correct and if this is the intended behaviour in
> Spark 3.
>
> Thanks and regards,
> Sourabh Badhya
>