You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Tony Wei <to...@gmail.com> on 2019/07/26 16:31:09 UTC

Is it possible to decide the order of where conditions in Flink SQL

Hi,

If I have multiple where conditions in my SQL, is it possible to specify
its order, so that the query
can be executed more efficiently?

For example, if I have the following SQL, it used a heavy UDF that needs to
access database.
However, if I can specify the order of conditions is executing
`!user.is_robot` first then executing
UDF, it will reduce the number of database access. Those records with
`true` in `user.is_robot` will
be dropped earlier and don't need to access database.

select *

from users

where !user.is_robot and UDF_NEED_TO_QUERY_DB(user)


Thanks,
Tony Wei

Re: Is it possible to decide the order of where conditions in Flink SQL

Posted by Tony Wei <to...@gmail.com>.
Hi Hequn,

Thank you very much. It's helpful to me.

For clarification, I think the code should look like the following snippet,
since original query was an `AND` operator. Am I right?

> CASE
> WHEN user_robot THEN false
> WHEN !UDF_NEED_TO_QUERY_DB(user) THEN false
> ELSE true
> END


Best regards,
Tony Wei


Hequn Cheng <ch...@gmail.com> 於 2019年7月28日 週日 下午3:30寫道:

> Hi Tony,
>
> There is no order guarantee for filter conditions. The conditions would be
> pushed down or merged during query optimization.
>
> However, you can use the case when[1] to achieve what you want. The code
> looks like:
> CASE
> WHEN !user.is_robot THEN true
> WHEN UDF_NEED_TO_QUERY_DB(user) THEN true
> ELSE false
> END
>
> For case when, it evaluates the conditions in order.
>
> Note: The UDF_NEED_TO_QUERY_DB must be a nonDeterministic udf, or the case
> when would also be optimized and changed to an OR by the query optimizer.
> You can override the isDeterministic method of ScalarFunction to make it
> nonDeterministic, i.e., override def isDeterministic: Boolean = false
>
> Best, Hequn
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/functions.html#conditional-functions
>
> On Sat, Jul 27, 2019 at 4:35 PM Tony Wei <to...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks for your reply. I have tried both CTE and sql subquery, but it
>> seems that sql plan
>> optimizer will do filter pushdown. Therefore, where conditions will end
>> up being together in
>> physical plan.
>>
>> However, the visualization of physical plans on Flink UI were different
>> for these three SQL
>> query on their operations' name.
>>
>> For the original SQL, it showed:
>>
>>> where: (AND(UDF_NEED_TO_QUERY_DB(user), NOT(user.is_robot))), select:
>>> (...)
>>
>>
>> For the CTE and subquery , it showed:
>>
>>> where: (AND(NOT(user.is_robot), UDF_NEED_TO_QUERY_DB(user))), select:
>>> (...)
>>
>>
>> Does this name for each operator of physical plan have any meaning to
>> represent the
>> execution order of `where` conditions?
>>
>> Best,
>> Tony Wei
>>
>> sri hari kali charan Tummala <ka...@gmail.com> 於 2019年7月27日 週六
>> 上午3:02寫道:
>>
>>> try cte common table expressions if it supports or sql subquery.
>>>
>>> On Fri, Jul 26, 2019 at 1:00 PM Fanbin Bu <fa...@coinbase.com>
>>> wrote:
>>>
>>>> how about move query db filter to the outer select.
>>>>
>>>> On Fri, Jul 26, 2019 at 9:31 AM Tony Wei <to...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> If I have multiple where conditions in my SQL, is it possible to
>>>>> specify its order, so that the query
>>>>> can be executed more efficiently?
>>>>>
>>>>> For example, if I have the following SQL, it used a heavy UDF that
>>>>> needs to access database.
>>>>> However, if I can specify the order of conditions is executing
>>>>> `!user.is_robot` first then executing
>>>>> UDF, it will reduce the number of database access. Those records with
>>>>> `true` in `user.is_robot` will
>>>>> be dropped earlier and don't need to access database.
>>>>>
>>>>> select *
>>>>>
>>>>> from users
>>>>>
>>>>> where !user.is_robot and UDF_NEED_TO_QUERY_DB(user)
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Tony Wei
>>>>>
>>>>
>>>
>>> --
>>> Thanks & Regards
>>> Sri Tummala
>>>
>>>

Re: Is it possible to decide the order of where conditions in Flink SQL

Posted by Hequn Cheng <ch...@gmail.com>.
Hi Tony,

There is no order guarantee for filter conditions. The conditions would be
pushed down or merged during query optimization.

However, you can use the case when[1] to achieve what you want. The code
looks like:
CASE
WHEN !user.is_robot THEN true
WHEN UDF_NEED_TO_QUERY_DB(user) THEN true
ELSE false
END

For case when, it evaluates the conditions in order.

Note: The UDF_NEED_TO_QUERY_DB must be a nonDeterministic udf, or the case
when would also be optimized and changed to an OR by the query optimizer.
You can override the isDeterministic method of ScalarFunction to make it
nonDeterministic, i.e., override def isDeterministic: Boolean = false

Best, Hequn

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/functions.html#conditional-functions

On Sat, Jul 27, 2019 at 4:35 PM Tony Wei <to...@gmail.com> wrote:

> Hi,
>
> Thanks for your reply. I have tried both CTE and sql subquery, but it
> seems that sql plan
> optimizer will do filter pushdown. Therefore, where conditions will end up
> being together in
> physical plan.
>
> However, the visualization of physical plans on Flink UI were different
> for these three SQL
> query on their operations' name.
>
> For the original SQL, it showed:
>
>> where: (AND(UDF_NEED_TO_QUERY_DB(user), NOT(user.is_robot))), select:
>> (...)
>
>
> For the CTE and subquery , it showed:
>
>> where: (AND(NOT(user.is_robot), UDF_NEED_TO_QUERY_DB(user))), select:
>> (...)
>
>
> Does this name for each operator of physical plan have any meaning to
> represent the
> execution order of `where` conditions?
>
> Best,
> Tony Wei
>
> sri hari kali charan Tummala <ka...@gmail.com> 於 2019年7月27日 週六
> 上午3:02寫道:
>
>> try cte common table expressions if it supports or sql subquery.
>>
>> On Fri, Jul 26, 2019 at 1:00 PM Fanbin Bu <fa...@coinbase.com> wrote:
>>
>>> how about move query db filter to the outer select.
>>>
>>> On Fri, Jul 26, 2019 at 9:31 AM Tony Wei <to...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> If I have multiple where conditions in my SQL, is it possible to
>>>> specify its order, so that the query
>>>> can be executed more efficiently?
>>>>
>>>> For example, if I have the following SQL, it used a heavy UDF that
>>>> needs to access database.
>>>> However, if I can specify the order of conditions is executing
>>>> `!user.is_robot` first then executing
>>>> UDF, it will reduce the number of database access. Those records with
>>>> `true` in `user.is_robot` will
>>>> be dropped earlier and don't need to access database.
>>>>
>>>> select *
>>>>
>>>> from users
>>>>
>>>> where !user.is_robot and UDF_NEED_TO_QUERY_DB(user)
>>>>
>>>>
>>>> Thanks,
>>>> Tony Wei
>>>>
>>>
>>
>> --
>> Thanks & Regards
>> Sri Tummala
>>
>>

Re: Is it possible to decide the order of where conditions in Flink SQL

Posted by Tony Wei <to...@gmail.com>.
Hi,

Thanks for your reply. I have tried both CTE and sql subquery, but it seems
that sql plan
optimizer will do filter pushdown. Therefore, where conditions will end up
being together in
physical plan.

However, the visualization of physical plans on Flink UI were different for
these three SQL
query on their operations' name.

For the original SQL, it showed:

> where: (AND(UDF_NEED_TO_QUERY_DB(user), NOT(user.is_robot))), select: (...)


For the CTE and subquery , it showed:

> where: (AND(NOT(user.is_robot), UDF_NEED_TO_QUERY_DB(user))), select: (...)


Does this name for each operator of physical plan have any meaning to
represent the
execution order of `where` conditions?

Best,
Tony Wei

sri hari kali charan Tummala <ka...@gmail.com> 於 2019年7月27日 週六
上午3:02寫道:

> try cte common table expressions if it supports or sql subquery.
>
> On Fri, Jul 26, 2019 at 1:00 PM Fanbin Bu <fa...@coinbase.com> wrote:
>
>> how about move query db filter to the outer select.
>>
>> On Fri, Jul 26, 2019 at 9:31 AM Tony Wei <to...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> If I have multiple where conditions in my SQL, is it possible to specify
>>> its order, so that the query
>>> can be executed more efficiently?
>>>
>>> For example, if I have the following SQL, it used a heavy UDF that needs
>>> to access database.
>>> However, if I can specify the order of conditions is executing
>>> `!user.is_robot` first then executing
>>> UDF, it will reduce the number of database access. Those records with
>>> `true` in `user.is_robot` will
>>> be dropped earlier and don't need to access database.
>>>
>>> select *
>>>
>>> from users
>>>
>>> where !user.is_robot and UDF_NEED_TO_QUERY_DB(user)
>>>
>>>
>>> Thanks,
>>> Tony Wei
>>>
>>
>
> --
> Thanks & Regards
> Sri Tummala
>
>

Re: Is it possible to decide the order of where conditions in Flink SQL

Posted by sri hari kali charan Tummala <ka...@gmail.com>.
try cte common table expressions if it supports or sql subquery.

On Fri, Jul 26, 2019 at 1:00 PM Fanbin Bu <fa...@coinbase.com> wrote:

> how about move query db filter to the outer select.
>
> On Fri, Jul 26, 2019 at 9:31 AM Tony Wei <to...@gmail.com> wrote:
>
>> Hi,
>>
>> If I have multiple where conditions in my SQL, is it possible to specify
>> its order, so that the query
>> can be executed more efficiently?
>>
>> For example, if I have the following SQL, it used a heavy UDF that needs
>> to access database.
>> However, if I can specify the order of conditions is executing
>> `!user.is_robot` first then executing
>> UDF, it will reduce the number of database access. Those records with
>> `true` in `user.is_robot` will
>> be dropped earlier and don't need to access database.
>>
>> select *
>>
>> from users
>>
>> where !user.is_robot and UDF_NEED_TO_QUERY_DB(user)
>>
>>
>> Thanks,
>> Tony Wei
>>
>

-- 
Thanks & Regards
Sri Tummala

Re: Is it possible to decide the order of where conditions in Flink SQL

Posted by Fanbin Bu <fa...@coinbase.com>.
how about move query db filter to the outer select.

On Fri, Jul 26, 2019 at 9:31 AM Tony Wei <to...@gmail.com> wrote:

> Hi,
>
> If I have multiple where conditions in my SQL, is it possible to specify
> its order, so that the query
> can be executed more efficiently?
>
> For example, if I have the following SQL, it used a heavy UDF that needs
> to access database.
> However, if I can specify the order of conditions is executing
> `!user.is_robot` first then executing
> UDF, it will reduce the number of database access. Those records with
> `true` in `user.is_robot` will
> be dropped earlier and don't need to access database.
>
> select *
>
> from users
>
> where !user.is_robot and UDF_NEED_TO_QUERY_DB(user)
>
>
> Thanks,
> Tony Wei
>