You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jungtaek Lim <ka...@gmail.com> on 2020/10/07 00:54:54 UTC

SQL DDL statements with replacing default catalog with custom catalog

Hi devs,

I'm not sure whether it's addressed in Spark 3.1, but at least from Spark
3.0.1, many SQL DDL statements don't seem to go through the custom catalog
when I replace default catalog with custom catalog and only provide
'dbName.tableName' as table identifier.

I'm not an expert in this area, but after skimming the code I feel
TempViewOrV1Table looks to be broken for the case, as it can still be a V2
table. Classifying the table identifier to either V2 table or "temp view or
v1 table" looks to be mandatory, as former and latter have different code
paths and different catalog interfaces.

That sounds to me as being stuck and the only "clear" approach seems to
disallow default catalog with custom one. Am I missing something?

Thanks,
Jungtaek Lim (HeartSaVioR)

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Jungtaek Lim <ka...@gmail.com>.
> If you just want to save typing the catalog name when writing table
names, you can set your custom catalog as the default catalog (See
SQLConf.DEFAULT_CATALOG). SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION is used
to extend the v1 session catalog, not replace it.

I'm sorry, but I don't get this.

The custom session catalog I use for V2_SESSION_CATALOG_IMPLEMENTATION is
intended to go through a specific provider first (v2) and go back to call
Spark's catalog if the table doesn't exist for the catalog. If this is not
a design intention of V2_SESSION_CATALOG_IMPLEMENTATION then OK (probably
should be documented somewhere), but the implementation doesn't receive any
call for methods so it's no-op even if it is just designed to extend V1
session catalog.

My understanding is, V1 commands leverage sparkSession.sessionState.catalog
which doesn't seem to know about extended session catalog. It just uses
ExternalCatalog which sticks to Spark built-in. That said, the
functionality is only partially working. Is this a thing we should fix for
Spark 3.0.2/3.1.0, or better to disable the feature until we ensure it
works for all commands?


On Thu, Oct 8, 2020 at 1:31 AM Ryan Blue <rb...@netflix.com> wrote:

> I disagree that this is “by design”. An operation like DROP TABLE should
> use a v2 drop plan if the table is v2.
>
> If a v2 table is loaded or created using a v2 catalog it should also be
> dropped that way. Otherwise, the v2 catalog is not notified when the table
> is dropped and can’t perform other necessary updates, like invalidating
> caches or dropping state outside of Hive. V2 tables should always use the
> v2 API, and I’m not aware of a design where that wasn’t the case.
>
> I’d also say that for DROP TABLE in particular, all calls could use the
> v2 catalog. We may not want to do this until we are confident as Wenchen
> said, but this would be the simpler solution. The v2 catalog can delegate
> to the old session catalog, after all.
>
> On Wed, Oct 7, 2020 at 3:48 AM Wenchen Fan <cl...@gmail.com> wrote:
>
>> If you just want to save typing the catalog name when writing table
>> names, you can set your custom catalog as the default catalog (See
>> SQLConf.DEFAULT_CATALOG). SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION is
>> used to extend the v1 session catalog, not replace it.
>>
>> On Wed, Oct 7, 2020 at 5:36 PM Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>>> If it's by design and not prepared, then IMHO replacing the default
>>> session catalog is better to be restricted until things are sorted out, as
>>> it gives pretty much confusion and has known bugs. Actually there's another
>>> bug/limitation on default session catalog on the length of identifier,
>>> so things that work with custom catalog no longer work when it replaces
>>> default session catalog.
>>>
>>> On Wed, Oct 7, 2020 at 6:05 PM Wenchen Fan <cl...@gmail.com> wrote:
>>>
>>>> Ah, this is by design. V1 tables should still go through the v1 session
>>>> catalog. I think we can remove this restriction when we are confident about
>>>> the new v2 DDL commands that work with v2 catalog APIs.
>>>>
>>>> On Wed, Oct 7, 2020 at 5:00 PM Jungtaek Lim <
>>>> kabhwan.opensource@gmail.com> wrote:
>>>>
>>>>> My case is DROP TABLE and DROP TABLE supports both v1 and v2 (as it
>>>>> simply works when I use custom catalog without replacing the default
>>>>> catalog).
>>>>>
>>>>> It just fails on v2 when the "default catalog" is replaced (say I
>>>>> replace 'spark_catalog'), because TempViewOrV1Table is providing value even
>>>>> with v2 table, and then the catalyst goes with v1 exec. I guess all
>>>>> commands leveraging TempViewOrV1Table to determine whether the table is v1
>>>>> vs v2 would all suffer from this issue.
>>>>>
>>>>> On Wed, Oct 7, 2020 at 5:45 PM Wenchen Fan <cl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Not all the DDL commands support v2 catalog APIs (e.g. CREATE TABLE
>>>>>> LIKE), so it's possible that some commands still go through the v1 session
>>>>>> catalog although you configured a custom v2 session catalog.
>>>>>>
>>>>>> Can you create JIRA tickets if you hit any DDL commands that don't
>>>>>> support v2 catalog? We should fix them.
>>>>>>
>>>>>> On Wed, Oct 7, 2020 at 9:15 AM Jungtaek Lim <
>>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>>
>>>>>>> The logical plan for the parsed statement is getting converted
>>>>>>> either for old one or v2, and for the former one it keeps using an external
>>>>>>> catalog (Hive) - so replacing default session catalog with custom one and
>>>>>>> trying to use it like it is in external catalog doesn't work, which
>>>>>>> destroys the purpose of replacing the default session catalog.
>>>>>>>
>>>>>>> Btw I see one approach: in TempViewOrV1Table, if it matches
>>>>>>> with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
>>>>>>> loadTable in catalog and see whether it's V1 table or not. Not sure it's a
>>>>>>> viable approach though, as it requires loading a table during resolution of
>>>>>>> the table identifier.
>>>>>>>
>>>>>>> On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rb...@netflix.com> wrote:
>>>>>>>
>>>>>>>> I've hit this with `DROP TABLE` commands that should be passed to a
>>>>>>>> registered v2 session catalog, but are handled by v1. I think that's the
>>>>>>>> only case we hit in our downstream test suites, but we haven't been
>>>>>>>> exploring the use of a session catalog for fallback. We use v2 for
>>>>>>>> everything now, which avoids the problem and comes with multi-catalog
>>>>>>>> support.
>>>>>>>>
>>>>>>>> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <
>>>>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi devs,
>>>>>>>>>
>>>>>>>>> I'm not sure whether it's addressed in Spark 3.1, but at least
>>>>>>>>> from Spark 3.0.1, many SQL DDL statements don't seem to go through the
>>>>>>>>> custom catalog when I replace default catalog with custom catalog and only
>>>>>>>>> provide 'dbName.tableName' as table identifier.
>>>>>>>>>
>>>>>>>>> I'm not an expert in this area, but after skimming the code I feel
>>>>>>>>> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
>>>>>>>>> table. Classifying the table identifier to either V2 table or "temp view or
>>>>>>>>> v1 table" looks to be mandatory, as former and latter have different code
>>>>>>>>> paths and different catalog interfaces.
>>>>>>>>>
>>>>>>>>> That sounds to me as being stuck and the only "clear" approach
>>>>>>>>> seems to disallow default catalog with custom one. Am I missing something?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Software Engineer
>>>>>>>> Netflix
>>>>>>>>
>>>>>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I disagree that this is “by design”. An operation like DROP TABLE should
use a v2 drop plan if the table is v2.

If a v2 table is loaded or created using a v2 catalog it should also be
dropped that way. Otherwise, the v2 catalog is not notified when the table
is dropped and can’t perform other necessary updates, like invalidating
caches or dropping state outside of Hive. V2 tables should always use the
v2 API, and I’m not aware of a design where that wasn’t the case.

I’d also say that for DROP TABLE in particular, all calls could use the v2
catalog. We may not want to do this until we are confident as Wenchen said,
but this would be the simpler solution. The v2 catalog can delegate to the
old session catalog, after all.

On Wed, Oct 7, 2020 at 3:48 AM Wenchen Fan <cl...@gmail.com> wrote:

> If you just want to save typing the catalog name when writing table names,
> you can set your custom catalog as the default catalog (See
> SQLConf.DEFAULT_CATALOG). SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION is
> used to extend the v1 session catalog, not replace it.
>
> On Wed, Oct 7, 2020 at 5:36 PM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> If it's by design and not prepared, then IMHO replacing the default
>> session catalog is better to be restricted until things are sorted out, as
>> it gives pretty much confusion and has known bugs. Actually there's another
>> bug/limitation on default session catalog on the length of identifier,
>> so things that work with custom catalog no longer work when it replaces
>> default session catalog.
>>
>> On Wed, Oct 7, 2020 at 6:05 PM Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> Ah, this is by design. V1 tables should still go through the v1 session
>>> catalog. I think we can remove this restriction when we are confident about
>>> the new v2 DDL commands that work with v2 catalog APIs.
>>>
>>> On Wed, Oct 7, 2020 at 5:00 PM Jungtaek Lim <
>>> kabhwan.opensource@gmail.com> wrote:
>>>
>>>> My case is DROP TABLE and DROP TABLE supports both v1 and v2 (as it
>>>> simply works when I use custom catalog without replacing the default
>>>> catalog).
>>>>
>>>> It just fails on v2 when the "default catalog" is replaced (say I
>>>> replace 'spark_catalog'), because TempViewOrV1Table is providing value even
>>>> with v2 table, and then the catalyst goes with v1 exec. I guess all
>>>> commands leveraging TempViewOrV1Table to determine whether the table is v1
>>>> vs v2 would all suffer from this issue.
>>>>
>>>> On Wed, Oct 7, 2020 at 5:45 PM Wenchen Fan <cl...@gmail.com> wrote:
>>>>
>>>>> Not all the DDL commands support v2 catalog APIs (e.g. CREATE TABLE
>>>>> LIKE), so it's possible that some commands still go through the v1 session
>>>>> catalog although you configured a custom v2 session catalog.
>>>>>
>>>>> Can you create JIRA tickets if you hit any DDL commands that don't
>>>>> support v2 catalog? We should fix them.
>>>>>
>>>>> On Wed, Oct 7, 2020 at 9:15 AM Jungtaek Lim <
>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>
>>>>>> The logical plan for the parsed statement is getting converted either
>>>>>> for old one or v2, and for the former one it keeps using an external
>>>>>> catalog (Hive) - so replacing default session catalog with custom one and
>>>>>> trying to use it like it is in external catalog doesn't work, which
>>>>>> destroys the purpose of replacing the default session catalog.
>>>>>>
>>>>>> Btw I see one approach: in TempViewOrV1Table, if it matches
>>>>>> with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
>>>>>> loadTable in catalog and see whether it's V1 table or not. Not sure it's a
>>>>>> viable approach though, as it requires loading a table during resolution of
>>>>>> the table identifier.
>>>>>>
>>>>>> On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rb...@netflix.com> wrote:
>>>>>>
>>>>>>> I've hit this with `DROP TABLE` commands that should be passed to a
>>>>>>> registered v2 session catalog, but are handled by v1. I think that's the
>>>>>>> only case we hit in our downstream test suites, but we haven't been
>>>>>>> exploring the use of a session catalog for fallback. We use v2 for
>>>>>>> everything now, which avoids the problem and comes with multi-catalog
>>>>>>> support.
>>>>>>>
>>>>>>> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <
>>>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi devs,
>>>>>>>>
>>>>>>>> I'm not sure whether it's addressed in Spark 3.1, but at least from
>>>>>>>> Spark 3.0.1, many SQL DDL statements don't seem to go through the custom
>>>>>>>> catalog when I replace default catalog with custom catalog and only provide
>>>>>>>> 'dbName.tableName' as table identifier.
>>>>>>>>
>>>>>>>> I'm not an expert in this area, but after skimming the code I feel
>>>>>>>> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
>>>>>>>> table. Classifying the table identifier to either V2 table or "temp view or
>>>>>>>> v1 table" looks to be mandatory, as former and latter have different code
>>>>>>>> paths and different catalog interfaces.
>>>>>>>>
>>>>>>>> That sounds to me as being stuck and the only "clear" approach
>>>>>>>> seems to disallow default catalog with custom one. Am I missing something?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Software Engineer
>>>>>>> Netflix
>>>>>>>
>>>>>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Wenchen Fan <cl...@gmail.com>.
If you just want to save typing the catalog name when writing table names,
you can set your custom catalog as the default catalog (See
SQLConf.DEFAULT_CATALOG). SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION is used
to extend the v1 session catalog, not replace it.

On Wed, Oct 7, 2020 at 5:36 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> If it's by design and not prepared, then IMHO replacing the default
> session catalog is better to be restricted until things are sorted out, as
> it gives pretty much confusion and has known bugs. Actually there's another
> bug/limitation on default session catalog on the length of identifier,
> so things that work with custom catalog no longer work when it replaces
> default session catalog.
>
> On Wed, Oct 7, 2020 at 6:05 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Ah, this is by design. V1 tables should still go through the v1 session
>> catalog. I think we can remove this restriction when we are confident about
>> the new v2 DDL commands that work with v2 catalog APIs.
>>
>> On Wed, Oct 7, 2020 at 5:00 PM Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>>> My case is DROP TABLE and DROP TABLE supports both v1 and v2 (as it
>>> simply works when I use custom catalog without replacing the default
>>> catalog).
>>>
>>> It just fails on v2 when the "default catalog" is replaced (say I
>>> replace 'spark_catalog'), because TempViewOrV1Table is providing value even
>>> with v2 table, and then the catalyst goes with v1 exec. I guess all
>>> commands leveraging TempViewOrV1Table to determine whether the table is v1
>>> vs v2 would all suffer from this issue.
>>>
>>> On Wed, Oct 7, 2020 at 5:45 PM Wenchen Fan <cl...@gmail.com> wrote:
>>>
>>>> Not all the DDL commands support v2 catalog APIs (e.g. CREATE TABLE
>>>> LIKE), so it's possible that some commands still go through the v1 session
>>>> catalog although you configured a custom v2 session catalog.
>>>>
>>>> Can you create JIRA tickets if you hit any DDL commands that don't
>>>> support v2 catalog? We should fix them.
>>>>
>>>> On Wed, Oct 7, 2020 at 9:15 AM Jungtaek Lim <
>>>> kabhwan.opensource@gmail.com> wrote:
>>>>
>>>>> The logical plan for the parsed statement is getting converted either
>>>>> for old one or v2, and for the former one it keeps using an external
>>>>> catalog (Hive) - so replacing default session catalog with custom one and
>>>>> trying to use it like it is in external catalog doesn't work, which
>>>>> destroys the purpose of replacing the default session catalog.
>>>>>
>>>>> Btw I see one approach: in TempViewOrV1Table, if it matches
>>>>> with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
>>>>> loadTable in catalog and see whether it's V1 table or not. Not sure it's a
>>>>> viable approach though, as it requires loading a table during resolution of
>>>>> the table identifier.
>>>>>
>>>>> On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rb...@netflix.com> wrote:
>>>>>
>>>>>> I've hit this with `DROP TABLE` commands that should be passed to a
>>>>>> registered v2 session catalog, but are handled by v1. I think that's the
>>>>>> only case we hit in our downstream test suites, but we haven't been
>>>>>> exploring the use of a session catalog for fallback. We use v2 for
>>>>>> everything now, which avoids the problem and comes with multi-catalog
>>>>>> support.
>>>>>>
>>>>>> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <
>>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>>
>>>>>>> Hi devs,
>>>>>>>
>>>>>>> I'm not sure whether it's addressed in Spark 3.1, but at least from
>>>>>>> Spark 3.0.1, many SQL DDL statements don't seem to go through the custom
>>>>>>> catalog when I replace default catalog with custom catalog and only provide
>>>>>>> 'dbName.tableName' as table identifier.
>>>>>>>
>>>>>>> I'm not an expert in this area, but after skimming the code I feel
>>>>>>> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
>>>>>>> table. Classifying the table identifier to either V2 table or "temp view or
>>>>>>> v1 table" looks to be mandatory, as former and latter have different code
>>>>>>> paths and different catalog interfaces.
>>>>>>>
>>>>>>> That sounds to me as being stuck and the only "clear" approach seems
>>>>>>> to disallow default catalog with custom one. Am I missing something?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>>>
>>>>>

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Jungtaek Lim <ka...@gmail.com>.
If it's by design and not prepared, then IMHO replacing the default session
catalog is better to be restricted until things are sorted out, as it gives
pretty much confusion and has known bugs. Actually there's another
bug/limitation on default session catalog on the length of identifier,
so things that work with custom catalog no longer work when it replaces
default session catalog.

On Wed, Oct 7, 2020 at 6:05 PM Wenchen Fan <cl...@gmail.com> wrote:

> Ah, this is by design. V1 tables should still go through the v1 session
> catalog. I think we can remove this restriction when we are confident about
> the new v2 DDL commands that work with v2 catalog APIs.
>
> On Wed, Oct 7, 2020 at 5:00 PM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> My case is DROP TABLE and DROP TABLE supports both v1 and v2 (as it
>> simply works when I use custom catalog without replacing the default
>> catalog).
>>
>> It just fails on v2 when the "default catalog" is replaced (say I replace
>> 'spark_catalog'), because TempViewOrV1Table is providing value even with v2
>> table, and then the catalyst goes with v1 exec. I guess all commands
>> leveraging TempViewOrV1Table to determine whether the table is v1 vs v2
>> would all suffer from this issue.
>>
>> On Wed, Oct 7, 2020 at 5:45 PM Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> Not all the DDL commands support v2 catalog APIs (e.g. CREATE TABLE
>>> LIKE), so it's possible that some commands still go through the v1 session
>>> catalog although you configured a custom v2 session catalog.
>>>
>>> Can you create JIRA tickets if you hit any DDL commands that don't
>>> support v2 catalog? We should fix them.
>>>
>>> On Wed, Oct 7, 2020 at 9:15 AM Jungtaek Lim <
>>> kabhwan.opensource@gmail.com> wrote:
>>>
>>>> The logical plan for the parsed statement is getting converted either
>>>> for old one or v2, and for the former one it keeps using an external
>>>> catalog (Hive) - so replacing default session catalog with custom one and
>>>> trying to use it like it is in external catalog doesn't work, which
>>>> destroys the purpose of replacing the default session catalog.
>>>>
>>>> Btw I see one approach: in TempViewOrV1Table, if it matches
>>>> with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
>>>> loadTable in catalog and see whether it's V1 table or not. Not sure it's a
>>>> viable approach though, as it requires loading a table during resolution of
>>>> the table identifier.
>>>>
>>>> On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rb...@netflix.com> wrote:
>>>>
>>>>> I've hit this with `DROP TABLE` commands that should be passed to a
>>>>> registered v2 session catalog, but are handled by v1. I think that's the
>>>>> only case we hit in our downstream test suites, but we haven't been
>>>>> exploring the use of a session catalog for fallback. We use v2 for
>>>>> everything now, which avoids the problem and comes with multi-catalog
>>>>> support.
>>>>>
>>>>> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <
>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>
>>>>>> Hi devs,
>>>>>>
>>>>>> I'm not sure whether it's addressed in Spark 3.1, but at least from
>>>>>> Spark 3.0.1, many SQL DDL statements don't seem to go through the custom
>>>>>> catalog when I replace default catalog with custom catalog and only provide
>>>>>> 'dbName.tableName' as table identifier.
>>>>>>
>>>>>> I'm not an expert in this area, but after skimming the code I feel
>>>>>> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
>>>>>> table. Classifying the table identifier to either V2 table or "temp view or
>>>>>> v1 table" looks to be mandatory, as former and latter have different code
>>>>>> paths and different catalog interfaces.
>>>>>>
>>>>>> That sounds to me as being stuck and the only "clear" approach seems
>>>>>> to disallow default catalog with custom one. Am I missing something?
>>>>>>
>>>>>> Thanks,
>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Software Engineer
>>>>> Netflix
>>>>>
>>>>

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Wenchen Fan <cl...@gmail.com>.
Ah, this is by design. V1 tables should still go through the v1 session
catalog. I think we can remove this restriction when we are confident about
the new v2 DDL commands that work with v2 catalog APIs.

On Wed, Oct 7, 2020 at 5:00 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> My case is DROP TABLE and DROP TABLE supports both v1 and v2 (as it simply
> works when I use custom catalog without replacing the default catalog).
>
> It just fails on v2 when the "default catalog" is replaced (say I replace
> 'spark_catalog'), because TempViewOrV1Table is providing value even with v2
> table, and then the catalyst goes with v1 exec. I guess all commands
> leveraging TempViewOrV1Table to determine whether the table is v1 vs v2
> would all suffer from this issue.
>
> On Wed, Oct 7, 2020 at 5:45 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Not all the DDL commands support v2 catalog APIs (e.g. CREATE TABLE
>> LIKE), so it's possible that some commands still go through the v1 session
>> catalog although you configured a custom v2 session catalog.
>>
>> Can you create JIRA tickets if you hit any DDL commands that don't
>> support v2 catalog? We should fix them.
>>
>> On Wed, Oct 7, 2020 at 9:15 AM Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>>> The logical plan for the parsed statement is getting converted either
>>> for old one or v2, and for the former one it keeps using an external
>>> catalog (Hive) - so replacing default session catalog with custom one and
>>> trying to use it like it is in external catalog doesn't work, which
>>> destroys the purpose of replacing the default session catalog.
>>>
>>> Btw I see one approach: in TempViewOrV1Table, if it matches
>>> with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
>>> loadTable in catalog and see whether it's V1 table or not. Not sure it's a
>>> viable approach though, as it requires loading a table during resolution of
>>> the table identifier.
>>>
>>> On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rb...@netflix.com> wrote:
>>>
>>>> I've hit this with `DROP TABLE` commands that should be passed to a
>>>> registered v2 session catalog, but are handled by v1. I think that's the
>>>> only case we hit in our downstream test suites, but we haven't been
>>>> exploring the use of a session catalog for fallback. We use v2 for
>>>> everything now, which avoids the problem and comes with multi-catalog
>>>> support.
>>>>
>>>> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <
>>>> kabhwan.opensource@gmail.com> wrote:
>>>>
>>>>> Hi devs,
>>>>>
>>>>> I'm not sure whether it's addressed in Spark 3.1, but at least from
>>>>> Spark 3.0.1, many SQL DDL statements don't seem to go through the custom
>>>>> catalog when I replace default catalog with custom catalog and only provide
>>>>> 'dbName.tableName' as table identifier.
>>>>>
>>>>> I'm not an expert in this area, but after skimming the code I feel
>>>>> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
>>>>> table. Classifying the table identifier to either V2 table or "temp view or
>>>>> v1 table" looks to be mandatory, as former and latter have different code
>>>>> paths and different catalog interfaces.
>>>>>
>>>>> That sounds to me as being stuck and the only "clear" approach seems
>>>>> to disallow default catalog with custom one. Am I missing something?
>>>>>
>>>>> Thanks,
>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Jungtaek Lim <ka...@gmail.com>.
My case is DROP TABLE and DROP TABLE supports both v1 and v2 (as it simply
works when I use custom catalog without replacing the default catalog).

It just fails on v2 when the "default catalog" is replaced (say I replace
'spark_catalog'), because TempViewOrV1Table is providing value even with v2
table, and then the catalyst goes with v1 exec. I guess all commands
leveraging TempViewOrV1Table to determine whether the table is v1 vs v2
would all suffer from this issue.

On Wed, Oct 7, 2020 at 5:45 PM Wenchen Fan <cl...@gmail.com> wrote:

> Not all the DDL commands support v2 catalog APIs (e.g. CREATE TABLE LIKE),
> so it's possible that some commands still go through the v1 session catalog
> although you configured a custom v2 session catalog.
>
> Can you create JIRA tickets if you hit any DDL commands that don't support
> v2 catalog? We should fix them.
>
> On Wed, Oct 7, 2020 at 9:15 AM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> The logical plan for the parsed statement is getting converted either for
>> old one or v2, and for the former one it keeps using an external catalog
>> (Hive) - so replacing default session catalog with custom one and trying to
>> use it like it is in external catalog doesn't work, which destroys the
>> purpose of replacing the default session catalog.
>>
>> Btw I see one approach: in TempViewOrV1Table, if it matches
>> with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
>> loadTable in catalog and see whether it's V1 table or not. Not sure it's a
>> viable approach though, as it requires loading a table during resolution of
>> the table identifier.
>>
>> On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rb...@netflix.com> wrote:
>>
>>> I've hit this with `DROP TABLE` commands that should be passed to a
>>> registered v2 session catalog, but are handled by v1. I think that's the
>>> only case we hit in our downstream test suites, but we haven't been
>>> exploring the use of a session catalog for fallback. We use v2 for
>>> everything now, which avoids the problem and comes with multi-catalog
>>> support.
>>>
>>> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <
>>> kabhwan.opensource@gmail.com> wrote:
>>>
>>>> Hi devs,
>>>>
>>>> I'm not sure whether it's addressed in Spark 3.1, but at least from
>>>> Spark 3.0.1, many SQL DDL statements don't seem to go through the custom
>>>> catalog when I replace default catalog with custom catalog and only provide
>>>> 'dbName.tableName' as table identifier.
>>>>
>>>> I'm not an expert in this area, but after skimming the code I feel
>>>> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
>>>> table. Classifying the table identifier to either V2 table or "temp view or
>>>> v1 table" looks to be mandatory, as former and latter have different code
>>>> paths and different catalog interfaces.
>>>>
>>>> That sounds to me as being stuck and the only "clear" approach seems to
>>>> disallow default catalog with custom one. Am I missing something?
>>>>
>>>> Thanks,
>>>> Jungtaek Lim (HeartSaVioR)
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Wenchen Fan <cl...@gmail.com>.
Not all the DDL commands support v2 catalog APIs (e.g. CREATE TABLE LIKE),
so it's possible that some commands still go through the v1 session catalog
although you configured a custom v2 session catalog.

Can you create JIRA tickets if you hit any DDL commands that don't support
v2 catalog? We should fix them.

On Wed, Oct 7, 2020 at 9:15 AM Jungtaek Lim <ka...@gmail.com>
wrote:

> The logical plan for the parsed statement is getting converted either for
> old one or v2, and for the former one it keeps using an external catalog
> (Hive) - so replacing default session catalog with custom one and trying to
> use it like it is in external catalog doesn't work, which destroys the
> purpose of replacing the default session catalog.
>
> Btw I see one approach: in TempViewOrV1Table, if it matches
> with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
> loadTable in catalog and see whether it's V1 table or not. Not sure it's a
> viable approach though, as it requires loading a table during resolution of
> the table identifier.
>
> On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rb...@netflix.com> wrote:
>
>> I've hit this with `DROP TABLE` commands that should be passed to a
>> registered v2 session catalog, but are handled by v1. I think that's the
>> only case we hit in our downstream test suites, but we haven't been
>> exploring the use of a session catalog for fallback. We use v2 for
>> everything now, which avoids the problem and comes with multi-catalog
>> support.
>>
>> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>>> Hi devs,
>>>
>>> I'm not sure whether it's addressed in Spark 3.1, but at least from
>>> Spark 3.0.1, many SQL DDL statements don't seem to go through the custom
>>> catalog when I replace default catalog with custom catalog and only provide
>>> 'dbName.tableName' as table identifier.
>>>
>>> I'm not an expert in this area, but after skimming the code I feel
>>> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
>>> table. Classifying the table identifier to either V2 table or "temp view or
>>> v1 table" looks to be mandatory, as former and latter have different code
>>> paths and different catalog interfaces.
>>>
>>> That sounds to me as being stuck and the only "clear" approach seems to
>>> disallow default catalog with custom one. Am I missing something?
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Jungtaek Lim <ka...@gmail.com>.
The logical plan for the parsed statement is getting converted either for
old one or v2, and for the former one it keeps using an external catalog
(Hive) - so replacing default session catalog with custom one and trying to
use it like it is in external catalog doesn't work, which destroys the
purpose of replacing the default session catalog.

Btw I see one approach: in TempViewOrV1Table, if it matches
with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
loadTable in catalog and see whether it's V1 table or not. Not sure it's a
viable approach though, as it requires loading a table during resolution of
the table identifier.

On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rb...@netflix.com> wrote:

> I've hit this with `DROP TABLE` commands that should be passed to a
> registered v2 session catalog, but are handled by v1. I think that's the
> only case we hit in our downstream test suites, but we haven't been
> exploring the use of a session catalog for fallback. We use v2 for
> everything now, which avoids the problem and comes with multi-catalog
> support.
>
> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> Hi devs,
>>
>> I'm not sure whether it's addressed in Spark 3.1, but at least from Spark
>> 3.0.1, many SQL DDL statements don't seem to go through the custom catalog
>> when I replace default catalog with custom catalog and only provide
>> 'dbName.tableName' as table identifier.
>>
>> I'm not an expert in this area, but after skimming the code I feel
>> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
>> table. Classifying the table identifier to either V2 table or "temp view or
>> v1 table" looks to be mandatory, as former and latter have different code
>> paths and different catalog interfaces.
>>
>> That sounds to me as being stuck and the only "clear" approach seems to
>> disallow default catalog with custom one. Am I missing something?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: SQL DDL statements with replacing default catalog with custom catalog

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I've hit this with `DROP TABLE` commands that should be passed to a
registered v2 session catalog, but are handled by v1. I think that's the
only case we hit in our downstream test suites, but we haven't been
exploring the use of a session catalog for fallback. We use v2 for
everything now, which avoids the problem and comes with multi-catalog
support.

On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> Hi devs,
>
> I'm not sure whether it's addressed in Spark 3.1, but at least from Spark
> 3.0.1, many SQL DDL statements don't seem to go through the custom catalog
> when I replace default catalog with custom catalog and only provide
> 'dbName.tableName' as table identifier.
>
> I'm not an expert in this area, but after skimming the code I feel
> TempViewOrV1Table looks to be broken for the case, as it can still be a V2
> table. Classifying the table identifier to either V2 table or "temp view or
> v1 table" looks to be mandatory, as former and latter have different code
> paths and different catalog interfaces.
>
> That sounds to me as being stuck and the only "clear" approach seems to
> disallow default catalog with custom one. Am I missing something?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>


-- 
Ryan Blue
Software Engineer
Netflix