You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Ankit Gupta <in...@gmail.com> on 2023/04/12 02:52:43 UTC

Spark Multiple Hive Metastore Catalog Support

Hi All

The question is regarding the support of multiple Remote Hive Metastore
catalogs with Spark. Starting Spark 3, multiple catalog support is added in
spark, but have we implemented any CatalogPlugin that can help us configure
multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
the Fully Qualified Class Name that I can try using for configuring a Hive
Metastore Catalog. If not, I would like to work on the implementation of
the CatalogPlugin that we can use to configure multiple Hive Metastore
Servers' .

Thanks and Regards.

Ankit Prakash Gupta
+91 8750101321
info.ankitp@gmail.com

Re: Spark Multiple Hive Metastore Catalog Support

Posted by Cheng Pan <ch...@apache.org>.
There is a DSv2-based Hive connector in Apache Kyuubi[1] that supports
connecting multiple HMS in a single Spark application.

Some limitations

- currently only supports Spark 3.3
- has a known issue when using w/ `spark-sql`, but OK w/ spark-shell and
normal jar-based Spark application.

[1]
https://github.com/apache/kyuubi/tree/master/extensions/spark/kyuubi-spark-connector-hive

Thanks,
Cheng Pan


On Apr 18, 2023 at 00:38:23, Elliot West <te...@gmail.com> wrote:

> Hi Ankit,
>
> While not a part of Spark, there is a project called 'WaggleDance' that
> can federate multiple Hive metastores so that they are accessible via a
> single URI: https://github.com/ExpediaGroup/waggle-dance
>
> This may be useful or perhaps serve as inspiration.
>
> Thanks,
>
> Elliot.
>
> On Mon, 17 Apr 2023 at 16:38, Ankit Gupta <in...@gmail.com> wrote:
>
>> ++
>> User Mailing List
>>
>> Just a reminder, anyone who can help on this.
>>
>> Thanks a lot !
>>
>> Ankit Prakash Gupta
>>
>> On Wed, Apr 12, 2023 at 8:22 AM Ankit Gupta <in...@gmail.com>
>> wrote:
>>
>>> Hi All
>>>
>>> The question is regarding the support of multiple Remote Hive Metastore
>>> catalogs with Spark. Starting Spark 3, multiple catalog support is added in
>>> spark, but have we implemented any CatalogPlugin that can help us configure
>>> multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
>>> the Fully Qualified Class Name that I can try using for configuring a Hive
>>> Metastore Catalog. If not, I would like to work on the implementation of
>>> the CatalogPlugin that we can use to configure multiple Hive Metastore
>>> Servers' .
>>>
>>> Thanks and Regards.
>>>
>>> Ankit Prakash Gupta
>>> +91 8750101321
>>> info.ankitp@gmail.com
>>>
>>>

Re: Spark Multiple Hive Metastore Catalog Support

Posted by Ankit Gupta <in...@gmail.com>.
Thanks Elliot ! Let me check it out !

On Mon, 17 Apr, 2023, 10:08 pm Elliot West, <te...@gmail.com> wrote:

> Hi Ankit,
>
> While not a part of Spark, there is a project called 'WaggleDance' that
> can federate multiple Hive metastores so that they are accessible via a
> single URI: https://github.com/ExpediaGroup/waggle-dance
>
> This may be useful or perhaps serve as inspiration.
>
> Thanks,
>
> Elliot.
>
> On Mon, 17 Apr 2023 at 16:38, Ankit Gupta <in...@gmail.com> wrote:
>
>> ++
>> User Mailing List
>>
>> Just a reminder, anyone who can help on this.
>>
>> Thanks a lot !
>>
>> Ankit Prakash Gupta
>>
>> On Wed, Apr 12, 2023 at 8:22 AM Ankit Gupta <in...@gmail.com>
>> wrote:
>>
>>> Hi All
>>>
>>> The question is regarding the support of multiple Remote Hive Metastore
>>> catalogs with Spark. Starting Spark 3, multiple catalog support is added in
>>> spark, but have we implemented any CatalogPlugin that can help us configure
>>> multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
>>> the Fully Qualified Class Name that I can try using for configuring a Hive
>>> Metastore Catalog. If not, I would like to work on the implementation of
>>> the CatalogPlugin that we can use to configure multiple Hive Metastore
>>> Servers' .
>>>
>>> Thanks and Regards.
>>>
>>> Ankit Prakash Gupta
>>> +91 8750101321
>>> info.ankitp@gmail.com
>>>
>>>

Re: Spark Multiple Hive Metastore Catalog Support

Posted by Ankit Gupta <in...@gmail.com>.
Thanks Elliot ! Let me check it out !

On Mon, 17 Apr, 2023, 10:08 pm Elliot West, <te...@gmail.com> wrote:

> Hi Ankit,
>
> While not a part of Spark, there is a project called 'WaggleDance' that
> can federate multiple Hive metastores so that they are accessible via a
> single URI: https://github.com/ExpediaGroup/waggle-dance
>
> This may be useful or perhaps serve as inspiration.
>
> Thanks,
>
> Elliot.
>
> On Mon, 17 Apr 2023 at 16:38, Ankit Gupta <in...@gmail.com> wrote:
>
>> ++
>> User Mailing List
>>
>> Just a reminder, anyone who can help on this.
>>
>> Thanks a lot !
>>
>> Ankit Prakash Gupta
>>
>> On Wed, Apr 12, 2023 at 8:22 AM Ankit Gupta <in...@gmail.com>
>> wrote:
>>
>>> Hi All
>>>
>>> The question is regarding the support of multiple Remote Hive Metastore
>>> catalogs with Spark. Starting Spark 3, multiple catalog support is added in
>>> spark, but have we implemented any CatalogPlugin that can help us configure
>>> multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
>>> the Fully Qualified Class Name that I can try using for configuring a Hive
>>> Metastore Catalog. If not, I would like to work on the implementation of
>>> the CatalogPlugin that we can use to configure multiple Hive Metastore
>>> Servers' .
>>>
>>> Thanks and Regards.
>>>
>>> Ankit Prakash Gupta
>>> +91 8750101321
>>> info.ankitp@gmail.com
>>>
>>>

Re: Spark Multiple Hive Metastore Catalog Support

Posted by Cheng Pan <ch...@apache.org>.
There is a DSv2-based Hive connector in Apache Kyuubi[1] that supports
connecting multiple HMS in a single Spark application.

Some limitations

- currently only supports Spark 3.3
- has a known issue when using w/ `spark-sql`, but OK w/ spark-shell and
normal jar-based Spark application.

[1]
https://github.com/apache/kyuubi/tree/master/extensions/spark/kyuubi-spark-connector-hive

Thanks,
Cheng Pan


On Apr 18, 2023 at 00:38:23, Elliot West <te...@gmail.com> wrote:

> Hi Ankit,
>
> While not a part of Spark, there is a project called 'WaggleDance' that
> can federate multiple Hive metastores so that they are accessible via a
> single URI: https://github.com/ExpediaGroup/waggle-dance
>
> This may be useful or perhaps serve as inspiration.
>
> Thanks,
>
> Elliot.
>
> On Mon, 17 Apr 2023 at 16:38, Ankit Gupta <in...@gmail.com> wrote:
>
>> ++
>> User Mailing List
>>
>> Just a reminder, anyone who can help on this.
>>
>> Thanks a lot !
>>
>> Ankit Prakash Gupta
>>
>> On Wed, Apr 12, 2023 at 8:22 AM Ankit Gupta <in...@gmail.com>
>> wrote:
>>
>>> Hi All
>>>
>>> The question is regarding the support of multiple Remote Hive Metastore
>>> catalogs with Spark. Starting Spark 3, multiple catalog support is added in
>>> spark, but have we implemented any CatalogPlugin that can help us configure
>>> multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
>>> the Fully Qualified Class Name that I can try using for configuring a Hive
>>> Metastore Catalog. If not, I would like to work on the implementation of
>>> the CatalogPlugin that we can use to configure multiple Hive Metastore
>>> Servers' .
>>>
>>> Thanks and Regards.
>>>
>>> Ankit Prakash Gupta
>>> +91 8750101321
>>> info.ankitp@gmail.com
>>>
>>>

Re: Spark Multiple Hive Metastore Catalog Support

Posted by Elliot West <te...@gmail.com>.
Hi Ankit,

While not a part of Spark, there is a project called 'WaggleDance' that can
federate multiple Hive metastores so that they are accessible via a single
URI: https://github.com/ExpediaGroup/waggle-dance

This may be useful or perhaps serve as inspiration.

Thanks,

Elliot.

On Mon, 17 Apr 2023 at 16:38, Ankit Gupta <in...@gmail.com> wrote:

> ++
> User Mailing List
>
> Just a reminder, anyone who can help on this.
>
> Thanks a lot !
>
> Ankit Prakash Gupta
>
> On Wed, Apr 12, 2023 at 8:22 AM Ankit Gupta <in...@gmail.com> wrote:
>
>> Hi All
>>
>> The question is regarding the support of multiple Remote Hive Metastore
>> catalogs with Spark. Starting Spark 3, multiple catalog support is added in
>> spark, but have we implemented any CatalogPlugin that can help us configure
>> multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
>> the Fully Qualified Class Name that I can try using for configuring a Hive
>> Metastore Catalog. If not, I would like to work on the implementation of
>> the CatalogPlugin that we can use to configure multiple Hive Metastore
>> Servers' .
>>
>> Thanks and Regards.
>>
>> Ankit Prakash Gupta
>> +91 8750101321
>> info.ankitp@gmail.com
>>
>>

Re: Spark Multiple Hive Metastore Catalog Support

Posted by Elliot West <te...@gmail.com>.
Hi Ankit,

While not a part of Spark, there is a project called 'WaggleDance' that can
federate multiple Hive metastores so that they are accessible via a single
URI: https://github.com/ExpediaGroup/waggle-dance

This may be useful or perhaps serve as inspiration.

Thanks,

Elliot.

On Mon, 17 Apr 2023 at 16:38, Ankit Gupta <in...@gmail.com> wrote:

> ++
> User Mailing List
>
> Just a reminder, anyone who can help on this.
>
> Thanks a lot !
>
> Ankit Prakash Gupta
>
> On Wed, Apr 12, 2023 at 8:22 AM Ankit Gupta <in...@gmail.com> wrote:
>
>> Hi All
>>
>> The question is regarding the support of multiple Remote Hive Metastore
>> catalogs with Spark. Starting Spark 3, multiple catalog support is added in
>> spark, but have we implemented any CatalogPlugin that can help us configure
>> multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
>> the Fully Qualified Class Name that I can try using for configuring a Hive
>> Metastore Catalog. If not, I would like to work on the implementation of
>> the CatalogPlugin that we can use to configure multiple Hive Metastore
>> Servers' .
>>
>> Thanks and Regards.
>>
>> Ankit Prakash Gupta
>> +91 8750101321
>> info.ankitp@gmail.com
>>
>>

Re: Spark Multiple Hive Metastore Catalog Support

Posted by Ankit Gupta <in...@gmail.com>.
++
User Mailing List

Just a reminder, anyone who can help on this.

Thanks a lot !

Ankit Prakash Gupta

On Wed, Apr 12, 2023 at 8:22 AM Ankit Gupta <in...@gmail.com> wrote:

> Hi All
>
> The question is regarding the support of multiple Remote Hive Metastore
> catalogs with Spark. Starting Spark 3, multiple catalog support is added in
> spark, but have we implemented any CatalogPlugin that can help us configure
> multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
> the Fully Qualified Class Name that I can try using for configuring a Hive
> Metastore Catalog. If not, I would like to work on the implementation of
> the CatalogPlugin that we can use to configure multiple Hive Metastore
> Servers' .
>
> Thanks and Regards.
>
> Ankit Prakash Gupta
> +91 8750101321
> info.ankitp@gmail.com
>
>

Re: Spark Multiple Hive Metastore Catalog Support

Posted by Ankit Gupta <in...@gmail.com>.
++
User Mailing List

Just a reminder, anyone who can help on this.

Thanks a lot !

Ankit Prakash Gupta

On Wed, Apr 12, 2023 at 8:22 AM Ankit Gupta <in...@gmail.com> wrote:

> Hi All
>
> The question is regarding the support of multiple Remote Hive Metastore
> catalogs with Spark. Starting Spark 3, multiple catalog support is added in
> spark, but have we implemented any CatalogPlugin that can help us configure
> multiple Remote Hive Metastore Catalogs ? If yes, can anyone help me with
> the Fully Qualified Class Name that I can try using for configuring a Hive
> Metastore Catalog. If not, I would like to work on the implementation of
> the CatalogPlugin that we can use to configure multiple Hive Metastore
> Servers' .
>
> Thanks and Regards.
>
> Ankit Prakash Gupta
> +91 8750101321
> info.ankitp@gmail.com
>
>