You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by "William.L" <wi...@gmail.com> on 2021/04/24 00:24:17 UTC

Understanding SQL join performance

Hi,

I am trying to understand why my colocated join between two tables/caches
are taking so long compare to the individual table filters.

----TABLE1

Returns 10000 count -- 0.13s

----TABLE2

Returns 65000 count -- 0.643s


---- JOIN TABLE1 and TABLE2

Returns 650K count -- 7s

Both analysis_input and analysis_output has index on (cohort_id, user_id,
timestamp). The affinity key is user_id. How do I analyze the performance
further?

Here's the explain which does not tell me much:



Is Ignite doing the join and filtering at each data node and then sending
the 650K total rows to the reduce before aggregation? If so, is it possible
for Ignite to do the some aggregation at the data node first and then send
the first level aggregation results to the reducer?






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Understanding SQL join performance

Posted by Taras Ledkov <tl...@gridgain.com>.

Hi,

Unfortunately I don't understand the root of the problem totally.
Looks like the performance depends linear on rows count:
10k ~ 0.1s
65k ~ 0.65s
650k ~ 7s
I see the linear dependency on rows count...

 > Is Ignite doing the join and filtering at each data node and then 
sending
 > the 650K total rows to the reduce before aggregation?

Which aggregation do you mean?
Please provide the query plan and data schema for details.

On 24.04.2021 3:24, William.L wrote:
> Hi,
>
> I am trying to understand why my colocated join between two tables/caches
> are taking so long compare to the individual table filters.
>
> ----TABLE1
>
> Returns 10000 count -- 0.13s
>
> ----TABLE2
>
> Returns 65000 count -- 0.643s
>
>
> ---- JOIN TABLE1 and TABLE2
>
> Returns 650K count -- 7s
>
> Both analysis_input and analysis_output has index on (cohort_id, user_id,
> timestamp). The affinity key is user_id. How do I analyze the performance
> further?
>
> Here's the explain which does not tell me much:
>
>
>
> Is Ignite doing the join and filtering at each data node and then sending
> the 650K total rows to the reduce before aggregation? If so, is it possible
> for Ignite to do the some aggregation at the data node first and then send
> the first level aggregation results to the reducer?
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com

Re: Understanding SQL join performance

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

If you had any images in your email, we are not seeing them. Please provide
links.

Regards,
-- 
Ilya Kasnacheev


сб, 24 апр. 2021 г. в 03:24, William.L <wi...@gmail.com>:

> Hi,
>
> I am trying to understand why my colocated join between two tables/caches
> are taking so long compare to the individual table filters.
>
> ----TABLE1
>
> Returns 10000 count -- 0.13s
>
> ----TABLE2
>
> Returns 65000 count -- 0.643s
>
>
> ---- JOIN TABLE1 and TABLE2
>
> Returns 650K count -- 7s
>
> Both analysis_input and analysis_output has index on (cohort_id, user_id,
> timestamp). The affinity key is user_id. How do I analyze the performance
> further?
>
> Here's the explain which does not tell me much:
>
>
>
> Is Ignite doing the join and filtering at each data node and then sending
> the 650K total rows to the reduce before aggregation? If so, is it possible
> for Ignite to do the some aggregation at the data node first and then send
> the first level aggregation results to the reducer?
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>