You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Pralabh Kumar <pr...@gmail.com> on 2022/01/10 13:41:16 UTC
Difference in behavior for Spark 3.0 vs Spark 3.1 "create database "
Hi Spark Team
When creating a database via Spark 3.0 on Hive
1) spark.sql("create database test location '/user/hive'"). It creates the
database location on hdfs . As expected
2) When running the same command on 3.1 the database is created on the
local file system by default. I have to prefix with hdfs to create db on
hdfs.
Why is there a difference in the behavior, Can you please point me to the
jira which causes this change.
Note : spark.sql.warehouse.dir and hive.metastore.warehouse.dir both are
having default values(not explicitly set)
Regards
Pralabh Kumar
Re: Difference in behavior for Spark 3.0 vs Spark 3.1 "create database "
Posted by Wenchen Fan <cl...@gmail.com>.
Hopefully, this StackOverflow answer can solve your problem:
https://stackoverflow.com/questions/47523037/how-do-i-configure-pyspark-to-write-to-hdfs-by-default
Spark doesn't control the behavior of qualifying paths. It's decided by
certain configs and/or config files.
On Tue, Jan 11, 2022 at 3:03 AM Pablo Langa Blanco <so...@gmail.com> wrote:
> Hi Pralabh,
>
> If it helps, it is probably related to this change
> https://github.com/apache/spark/pull/28527
>
> Regards
>
> On Mon, Jan 10, 2022 at 10:42 AM Pralabh Kumar <pr...@gmail.com>
> wrote:
>
>> Hi Spark Team
>>
>> When creating a database via Spark 3.0 on Hive
>>
>> 1) spark.sql("create database test location '/user/hive'"). It creates
>> the database location on hdfs . As expected
>>
>> 2) When running the same command on 3.1 the database is created on the
>> local file system by default. I have to prefix with hdfs to create db on
>> hdfs.
>>
>> Why is there a difference in the behavior, Can you please point me to the
>> jira which causes this change.
>>
>> Note : spark.sql.warehouse.dir and hive.metastore.warehouse.dir both are
>> having default values(not explicitly set)
>>
>> Regards
>> Pralabh Kumar
>>
>
Re: Difference in behavior for Spark 3.0 vs Spark 3.1 "create database "
Posted by Wenchen Fan <cl...@gmail.com>.
Hopefully, this StackOverflow answer can solve your problem:
https://stackoverflow.com/questions/47523037/how-do-i-configure-pyspark-to-write-to-hdfs-by-default
Spark doesn't control the behavior of qualifying paths. It's decided by
certain configs and/or config files.
On Tue, Jan 11, 2022 at 3:03 AM Pablo Langa Blanco <so...@gmail.com> wrote:
> Hi Pralabh,
>
> If it helps, it is probably related to this change
> https://github.com/apache/spark/pull/28527
>
> Regards
>
> On Mon, Jan 10, 2022 at 10:42 AM Pralabh Kumar <pr...@gmail.com>
> wrote:
>
>> Hi Spark Team
>>
>> When creating a database via Spark 3.0 on Hive
>>
>> 1) spark.sql("create database test location '/user/hive'"). It creates
>> the database location on hdfs . As expected
>>
>> 2) When running the same command on 3.1 the database is created on the
>> local file system by default. I have to prefix with hdfs to create db on
>> hdfs.
>>
>> Why is there a difference in the behavior, Can you please point me to the
>> jira which causes this change.
>>
>> Note : spark.sql.warehouse.dir and hive.metastore.warehouse.dir both are
>> having default values(not explicitly set)
>>
>> Regards
>> Pralabh Kumar
>>
>
Re: Difference in behavior for Spark 3.0 vs Spark 3.1 "create database "
Posted by Pablo Langa Blanco <so...@gmail.com>.
Hi Pralabh,
If it helps, it is probably related to this change
https://github.com/apache/spark/pull/28527
Regards
On Mon, Jan 10, 2022 at 10:42 AM Pralabh Kumar <pr...@gmail.com>
wrote:
> Hi Spark Team
>
> When creating a database via Spark 3.0 on Hive
>
> 1) spark.sql("create database test location '/user/hive'"). It creates
> the database location on hdfs . As expected
>
> 2) When running the same command on 3.1 the database is created on the
> local file system by default. I have to prefix with hdfs to create db on
> hdfs.
>
> Why is there a difference in the behavior, Can you please point me to the
> jira which causes this change.
>
> Note : spark.sql.warehouse.dir and hive.metastore.warehouse.dir both are
> having default values(not explicitly set)
>
> Regards
> Pralabh Kumar
>