You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Terry Kim <yu...@gmail.com> on 2019/12/02 01:12:04 UTC

[DISCUSS] Consistent relation resolution behavior in SparkSQL

Hi all,

As discussed in SPARK-29900, Spark currently has two different relation
resolution behaviors:

   1. Look up temp view first, then table/persistent view
   2. Look up table/persistent view

The first behavior is used in SELECT, INSERT and a few commands that
support temp views such as DESCRIBE TABLE, etc. The second behavior is used
in most commands. Thus, it is hard to predict which relation resolution
rule is being applied for a given command.

I want to propose a consistent relation resolution behavior in which temp
views are always looked up first before table/persistent view, as
described more in detail in this doc: consistent relation resolution
proposal
<https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing>
.

Note that this proposal is a breaking change, but the impact should be
minimal since this applies only when there are temp views and tables with
the same name.

Any feedback will be appreciated.

I also want to thank Wenchen Fan, Ryan Blue, Burak Yavuz, and Dongjoon Hyun
for guidance and suggestion.

Regards,
Terry


<https://issues.apache.org/jira/browse/SPARK-29900>

Re: [DISCUSS] Consistent relation resolution behavior in SparkSQL

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
+1 for the proposal. The current behavior is confusing.

We also came up with another case that we should consider while
implementing a ViewCatalog: an unresolved relation in a permanent view
(from a view catalog) should never resolve a temporary table. If I have a
view `pview` defined as `select * from t1` with database `db`, then `t1`
should always resolve to `db.t1` and never a temp view `t1`. If it resolves
to the temp view, then temp views can unexpectedly change the behavior of
stored views.

On Wed, Dec 4, 2019 at 7:02 PM Wenchen Fan <cl...@gmail.com> wrote:

> +1, I think it's good for both end-users and Spark developers:
> * for end-users, when they lookup a table, they don't need to care which
> command triggers it, as the behavior is consistent in all the places.
> * for Spark developers, we may simplify the code quite a bit. For now we
> have two code paths to lookup tables: one for SELECT/INSERT and one for
> other commands.
>
> Thanks,
> Wenchen
>
> On Mon, Dec 2, 2019 at 9:12 AM Terry Kim <yu...@gmail.com> wrote:
>
>> Hi all,
>>
>> As discussed in SPARK-29900, Spark currently has two different relation
>> resolution behaviors:
>>
>>    1. Look up temp view first, then table/persistent view
>>    2. Look up table/persistent view
>>
>> The first behavior is used in SELECT, INSERT and a few commands that
>> support temp views such as DESCRIBE TABLE, etc. The second behavior is used
>> in most commands. Thus, it is hard to predict which relation resolution
>> rule is being applied for a given command.
>>
>> I want to propose a consistent relation resolution behavior in which temp
>> views are always looked up first before table/persistent view, as
>> described more in detail in this doc: consistent relation resolution
>> proposal
>> <https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing>
>> .
>>
>> Note that this proposal is a breaking change, but the impact should be
>> minimal since this applies only when there are temp views and tables with
>> the same name.
>>
>> Any feedback will be appreciated.
>>
>> I also want to thank Wenchen Fan, Ryan Blue, Burak Yavuz, and Dongjoon
>> Hyun for guidance and suggestion.
>>
>> Regards,
>> Terry
>>
>>
>> <https://issues.apache.org/jira/browse/SPARK-29900>
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: [DISCUSS] Consistent relation resolution behavior in SparkSQL

Posted by Wenchen Fan <cl...@gmail.com>.
+1, I think it's good for both end-users and Spark developers:
* for end-users, when they lookup a table, they don't need to care which
command triggers it, as the behavior is consistent in all the places.
* for Spark developers, we may simplify the code quite a bit. For now we
have two code paths to lookup tables: one for SELECT/INSERT and one for
other commands.

Thanks,
Wenchen

On Mon, Dec 2, 2019 at 9:12 AM Terry Kim <yu...@gmail.com> wrote:

> Hi all,
>
> As discussed in SPARK-29900, Spark currently has two different relation
> resolution behaviors:
>
>    1. Look up temp view first, then table/persistent view
>    2. Look up table/persistent view
>
> The first behavior is used in SELECT, INSERT and a few commands that
> support temp views such as DESCRIBE TABLE, etc. The second behavior is used
> in most commands. Thus, it is hard to predict which relation resolution
> rule is being applied for a given command.
>
> I want to propose a consistent relation resolution behavior in which temp
> views are always looked up first before table/persistent view, as
> described more in detail in this doc: consistent relation resolution
> proposal
> <https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing>
> .
>
> Note that this proposal is a breaking change, but the impact should be
> minimal since this applies only when there are temp views and tables with
> the same name.
>
> Any feedback will be appreciated.
>
> I also want to thank Wenchen Fan, Ryan Blue, Burak Yavuz, and Dongjoon
> Hyun for guidance and suggestion.
>
> Regards,
> Terry
>
>
> <https://issues.apache.org/jira/browse/SPARK-29900>
>