You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Айсина Роза Мунеровна <ro...@sbermarket.ru> on 2023/01/10 19:30:16 UTC

How to check that affinity key works?

Hola!

We want to optimize our SQL queries that make collocated JOINs on several tables (about ~8 tables).

Some tables have column “product_id” on which these tables are joined.
Business meaning is that the result are features for each product_id.

So based on documentation we set “product_id” as affinity key
so that all data required for join will be located on the same node.
(Column “product_id” is always part of the primary key)

But! After this we made experiments:
- put affinity key to other part of primary key (for example, if primary key is "(product_id, store_id)", then affinity key is “store_id”);
- didn't specify affinity key at all.

The problem is that all our load testing results didn’t changed!

So the question - is there any way to make more advanced EXPLAIN,
that will show partition shuffling (if it happens) or data collocation?
Some debug tool for this problem. Like query plan in Spark.


Information about our setup:
- Ignite cluster on 5 VMs;
- all tables are partitioned or replicated;
- all tables are created with DDL SQL and all interactions are made only through SQL API;
- DDL example:

CREATE TABLE IF NOT EXISTS PUBLIC.ProductFeatures
(
    product_id INT PRIMARY KEY,
    total_cnt_orders_with_sku INT
)
WITH "CACHE_NAME=PUBLIC_ProductFeatures, KEY_TYPE=io.sbmt.ProductFeaturesKey, VALUE_TYPE=io.sbmt.ProductFeaturesValue, AFFINITY_KEY=product_id, TEMPLATE=PARTITIONED, BACKUPS=1

- our main SQL query:

SELECT
    ProductFeatures.product_id,
    ProductFeatures.total_cnt_orders_with_sku,
    StoreProductFeatures.price,
    UserProductFeaturesOrder.num_prev_orders_with_sku,
    ...
FROM ProductFeatures
LEFT JOIN StoreProductFeatures
  ON ProductFeatures.product_id = StoreProductFeatures.product_id
  AND StoreProductFeatures.store_id = {store_id}
... (more joins)
CROSS JOIN UserFeaturesDiscount
WHERE UserFeaturesDiscount.user_id = {user_id}
  AND ProductFeatures.product_id IN {skus}
  …

Looking forward for some help.

--

Роза Айсина

Старший разработчик ПО

СберМаркет | Доставка из любимых магазинов



Email: roza.aysina@sbermarket.ru<ma...@sbermarket.ru>

Mob:

Web: sbermarket.ru<https://sbermarket.ru/>

App: iOS<https://apps.apple.com/ru/app/%D1%81%D0%B1%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82-%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D0%BE%D0%B2/id1166642457> и Android<https://play.google.com/store/apps/details?id=ru.instamart&hl=en&gl=ru>



УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: это электронное сообщение и любые документы, приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем Вас о том, что, если это сообщение не предназначено Вам, использование, копирование, распространение информации, содержащейся в настоящем сообщении, а также осуществление любых действий на основе этой информации, строго запрещено. Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом отправителю по электронной почте и удалите это сообщение.
CONFIDENTIALITY NOTICE: This email and any files attached to it are confidential. If you are not the intended recipient you are notified that using, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error please notify the sender and delete this email.

Re: How to check that affinity key works?

Posted by Jeremy McMillan <je...@gridgain.com>.
Using distributed join will set a baseline for correct result set. If you
get different results without distributed join, then you know there's a
mismatch between join conditions and affinity. If you get the same results,
then data is distributed well for this join.

On Thu, Jan 12, 2023, 02:02 Айсина Роза Мунеровна <ro...@sbermarket.ru>
wrote:

> Hi Jeremy!
>
> Thank you for your reply.
>
> Am I right that if JOIN is collocated then absence of affinity key will
> lead to incorrect results as data will not be fetched from other nodes?
>
> So the correct way to check influence of affinity key is to enable
> distributed JOIN?
>
> On 10 Jan 2023, at 10:48 PM, Jeremy McMillan <je...@gridgain.com>
> wrote:
>
> Внимание: Внешний отправитель!
> Если вы не знаете отправителя - не открывайте вложения, не переходите по
> ссылкам, не пересылайте письмо!
>
> If you are only doing colocated joins, then there will be no runtime
> overhead incurred by collecting distributed rows (colocated joins mean do
> not try to join data that might be distributed across nodes), so there
> might not be much difference in runtimes.
>
> The difference between different affinity keys, if any, will be seen in
> the results returned, and unless there's a significant difference in row
> count, it will be difficult to affect runtime performance using this
> strategy.
>
> On Tue, Jan 10, 2023, 13:32 Айсина Роза Мунеровна <
> roza.aysina@sbermarket.ru> wrote:
>
>> Hola!
>>
>> We want to optimize our SQL queries that make collocated JOINs on several
>> tables (about ~8 tables).
>>
>> Some tables have column “product_id” on which these tables are joined.
>> Business meaning is that the result are features for each product_id.
>>
>> So based on documentation we set “*product_id*” as affinity key
>> so that all data required for join will be located on the same node.
>> (Column “*product_id*” is always part of the primary key)
>>
>> But! After this we made experiments:
>> - put affinity key to other part of primary key (for example, if primary
>> key is "(product_id, store_id)", then affinity key is “store_id”);
>> - didn't specify affinity key at all.
>>
>> The problem is that all our load testing results didn’t changed!
>>
>> So the question - is there any way to make more advanced *EXPLAIN*,
>> that will show partition shuffling (if it happens) or data collocation?
>> Some debug tool for this problem. Like query plan in Spark.
>>
>>
>> Information about our setup:
>> - Ignite cluster on 5 VMs;
>> - all tables are partitioned or replicated;
>> - all tables are created with DDL SQL and all interactions are made *
>> only* through SQL API;
>> - DDL example:
>>
>> CREATE TABLE IF NOT EXISTS PUBLIC.ProductFeatures
>> (
>>     product_id INT PRIMARY KEY,
>>     total_cnt_orders_with_sku INT
>> )
>> WITH "CACHE_NAME=PUBLIC_ProductFeatures,
>> KEY_TYPE=io.sbmt.ProductFeaturesKey,
>> VALUE_TYPE=io.sbmt.ProductFeaturesValue, AFFINITY_KEY=product_id,
>> TEMPLATE=PARTITIONED, BACKUPS=1
>>
>> - our main SQL query:
>>
>> SELECT
>>     ProductFeatures.product_id,
>>     ProductFeatures.total_cnt_orders_with_sku,
>>     StoreProductFeatures.price,
>>     UserProductFeaturesOrder.num_prev_orders_with_sku,
>>     ...
>> FROM ProductFeatures
>> LEFT JOIN StoreProductFeatures
>>   ON ProductFeatures.product_id = StoreProductFeatures.product_id
>>   AND StoreProductFeatures.store_id = {store_id}
>> ... (more joins)
>> CROSS JOIN UserFeaturesDiscount
>> WHERE UserFeaturesDiscount.user_id = {user_id}
>>   AND ProductFeatures.product_id IN {skus}
>>   …
>>
>> Looking forward for some help.
>> *--*
>>
>> *Роза Айсина*
>> Старший разработчик ПО
>> *СберМаркет* | Доставка из любимых магазинов
>>
>>
>> Email: roza.aysina@sbermarket.ru
>> Mob:
>> Web: sbermarket.ru
>> App: iOS
>> <https://apps.apple.com/ru/app/%D1%81%D0%B1%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82-%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D0%BE%D0%B2/id1166642457>
>> и Android
>> <https://play.google.com/store/apps/details?id=ru.instamart&hl=en&gl=ru>
>>
>>
>>
>>
>>
>>
>> *УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ:* это электронное сообщение и любые
>> документы, приложенные к нему, содержат конфиденциальную информацию.
>> Настоящим уведомляем Вас о том, что, если это сообщение не предназначено
>> Вам, использование, копирование, распространение информации, содержащейся в
>> настоящем сообщении, а также осуществление любых действий на основе этой
>> информации, строго запрещено. Если Вы получили это сообщение по ошибке,
>> пожалуйста, сообщите об этом отправителю по электронной почте и удалите это
>> сообщение.
>> *CONFIDENTIALITY NOTICE:* This email and any files attached to it are
>> confidential. If you are not the intended recipient you are notified that
>> using, copying, distributing or taking any action in reliance on the
>> contents of this information is strictly prohibited. If you have received
>> this email in error please notify the sender and delete this email.
>>
>
> *--*
>
> *Роза Айсина*
>
> Старший разработчик ПО
>
> *СберМаркет* | Доставка из любимых магазинов
>
>
>
> Email: roza.aysina@sbermarket.ru
>
> Mob:
>
> Web: sbermarket.ru
>
> App: iOS
> <https://apps.apple.com/ru/app/%D1%81%D0%B1%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82-%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D0%BE%D0%B2/id1166642457>
> и Android
> <https://play.google.com/store/apps/details?id=ru.instamart&hl=en&gl=ru>
>
>
>
> *УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ:* это электронное сообщение и любые
> документы, приложенные к нему, содержат конфиденциальную информацию.
> Настоящим уведомляем Вас о том, что, если это сообщение не предназначено
> Вам, использование, копирование, распространение информации, содержащейся в
> настоящем сообщении, а также осуществление любых действий на основе этой
> информации, строго запрещено. Если Вы получили это сообщение по ошибке,
> пожалуйста, сообщите об этом отправителю по электронной почте и удалите это
> сообщение.
> *CONFIDENTIALITY NOTICE:* This email and any files attached to it are
> confidential. If you are not the intended recipient you are notified that
> using, copying, distributing or taking any action in reliance on the
> contents of this information is strictly prohibited. If you have received
> this email in error please notify the sender and delete this email.
>

Re: How to check that affinity key works?

Posted by Айсина Роза Мунеровна <ro...@sbermarket.ru>.
Hi Jeremy!

Thank you for your reply.

Am I right that if JOIN is collocated then absence of affinity key will lead to incorrect results as data will not be fetched from other nodes?

So the correct way to check influence of affinity key is to enable distributed JOIN?

On 10 Jan 2023, at 10:48 PM, Jeremy McMillan <je...@gridgain.com> wrote:

Внимание: Внешний отправитель!
Если вы не знаете отправителя - не открывайте вложения, не переходите по ссылкам, не пересылайте письмо!

If you are only doing colocated joins, then there will be no runtime overhead incurred by collecting distributed rows (colocated joins mean do not try to join data that might be distributed across nodes), so there might not be much difference in runtimes.

The difference between different affinity keys, if any, will be seen in the results returned, and unless there's a significant difference in row count, it will be difficult to affect runtime performance using this strategy.

On Tue, Jan 10, 2023, 13:32 Айсина Роза Мунеровна <ro...@sbermarket.ru>> wrote:
Hola!

We want to optimize our SQL queries that make collocated JOINs on several tables (about ~8 tables).

Some tables have column “product_id” on which these tables are joined.
Business meaning is that the result are features for each product_id.

So based on documentation we set “product_id” as affinity key
so that all data required for join will be located on the same node.
(Column “product_id” is always part of the primary key)

But! After this we made experiments:
- put affinity key to other part of primary key (for example, if primary key is "(product_id, store_id)", then affinity key is “store_id”);
- didn't specify affinity key at all.

The problem is that all our load testing results didn’t changed!

So the question - is there any way to make more advanced EXPLAIN,
that will show partition shuffling (if it happens) or data collocation?
Some debug tool for this problem. Like query plan in Spark.


Information about our setup:
- Ignite cluster on 5 VMs;
- all tables are partitioned or replicated;
- all tables are created with DDL SQL and all interactions are made only through SQL API;
- DDL example:

CREATE TABLE IF NOT EXISTS PUBLIC.ProductFeatures
(
    product_id INT PRIMARY KEY,
    total_cnt_orders_with_sku INT
)
WITH "CACHE_NAME=PUBLIC_ProductFeatures, KEY_TYPE=io.sbmt.ProductFeaturesKey, VALUE_TYPE=io.sbmt.ProductFeaturesValue, AFFINITY_KEY=product_id, TEMPLATE=PARTITIONED, BACKUPS=1

- our main SQL query:

SELECT
    ProductFeatures.product_id,
    ProductFeatures.total_cnt_orders_with_sku,
    StoreProductFeatures.price,
    UserProductFeaturesOrder.num_prev_orders_with_sku,
    ...
FROM ProductFeatures
LEFT JOIN StoreProductFeatures
  ON ProductFeatures.product_id = StoreProductFeatures.product_id
  AND StoreProductFeatures.store_id = {store_id}
... (more joins)
CROSS JOIN UserFeaturesDiscount
WHERE UserFeaturesDiscount.user_id = {user_id}
  AND ProductFeatures.product_id IN {skus}
  …

Looking forward for some help.
--

Роза Айсина
Старший разработчик ПО
СберМаркет | Доставка из любимых магазинов



Email: roza.aysina@sbermarket.ru<ma...@sbermarket.ru>
Mob:
Web: sbermarket.ru<https://sbermarket.ru/>
App: iOS<https://apps.apple.com/ru/app/%D1%81%D0%B1%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82-%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D0%BE%D0%B2/id1166642457> и Android<https://play.google.com/store/apps/details?id=ru.instamart&hl=en&gl=ru>








УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: это электронное сообщение и любые документы, приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем Вас о том, что, если это сообщение не предназначено Вам, использование, копирование, распространение информации, содержащейся в настоящем сообщении, а также осуществление любых действий на основе этой информации, строго запрещено. Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом отправителю по электронной почте и удалите это сообщение.
CONFIDENTIALITY NOTICE: This email and any files attached to it are confidential. If you are not the intended recipient you are notified that using, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error please notify the sender and delete this email.


--

Роза Айсина

Старший разработчик ПО

СберМаркет | Доставка из любимых магазинов



Email: roza.aysina@sbermarket.ru<ma...@sbermarket.ru>

Mob:

Web: sbermarket.ru<https://sbermarket.ru/>

App: iOS<https://apps.apple.com/ru/app/%D1%81%D0%B1%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82-%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D0%BE%D0%B2/id1166642457> и Android<https://play.google.com/store/apps/details?id=ru.instamart&hl=en&gl=ru>



УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: это электронное сообщение и любые документы, приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем Вас о том, что, если это сообщение не предназначено Вам, использование, копирование, распространение информации, содержащейся в настоящем сообщении, а также осуществление любых действий на основе этой информации, строго запрещено. Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом отправителю по электронной почте и удалите это сообщение.
CONFIDENTIALITY NOTICE: This email and any files attached to it are confidential. If you are not the intended recipient you are notified that using, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error please notify the sender and delete this email.

Re: How to check that affinity key works?

Posted by Jeremy McMillan <je...@gridgain.com>.
If you are only doing colocated joins, then there will be no runtime
overhead incurred by collecting distributed rows (colocated joins mean do
not try to join data that might be distributed across nodes), so there
might not be much difference in runtimes.

The difference between different affinity keys, if any, will be seen in the
results returned, and unless there's a significant difference in row count,
it will be difficult to affect runtime performance using this strategy.

On Tue, Jan 10, 2023, 13:32 Айсина Роза Мунеровна <ro...@sbermarket.ru>
wrote:

> Hola!
>
> We want to optimize our SQL queries that make collocated JOINs on several
> tables (about ~8 tables).
>
> Some tables have column “product_id” on which these tables are joined.
> Business meaning is that the result are features for each product_id.
>
> So based on documentation we set “*product_id*” as affinity key
> so that all data required for join will be located on the same node.
> (Column “*product_id*” is always part of the primary key)
>
> But! After this we made experiments:
> - put affinity key to other part of primary key (for example, if primary
> key is "(product_id, store_id)", then affinity key is “store_id”);
> - didn't specify affinity key at all.
>
> The problem is that all our load testing results didn’t changed!
>
> So the question - is there any way to make more advanced *EXPLAIN*,
> that will show partition shuffling (if it happens) or data collocation?
> Some debug tool for this problem. Like query plan in Spark.
>
>
> Information about our setup:
> - Ignite cluster on 5 VMs;
> - all tables are partitioned or replicated;
> - all tables are created with DDL SQL and all interactions are made *only*
> through SQL API;
> - DDL example:
>
> CREATE TABLE IF NOT EXISTS PUBLIC.ProductFeatures
> (
>     product_id INT PRIMARY KEY,
>     total_cnt_orders_with_sku INT
> )
> WITH "CACHE_NAME=PUBLIC_ProductFeatures,
> KEY_TYPE=io.sbmt.ProductFeaturesKey,
> VALUE_TYPE=io.sbmt.ProductFeaturesValue, AFFINITY_KEY=product_id,
> TEMPLATE=PARTITIONED, BACKUPS=1
>
> - our main SQL query:
>
> SELECT
>     ProductFeatures.product_id,
>     ProductFeatures.total_cnt_orders_with_sku,
>     StoreProductFeatures.price,
>     UserProductFeaturesOrder.num_prev_orders_with_sku,
>     ...
> FROM ProductFeatures
> LEFT JOIN StoreProductFeatures
>   ON ProductFeatures.product_id = StoreProductFeatures.product_id
>   AND StoreProductFeatures.store_id = {store_id}
> ... (more joins)
> CROSS JOIN UserFeaturesDiscount
> WHERE UserFeaturesDiscount.user_id = {user_id}
>   AND ProductFeatures.product_id IN {skus}
>   …
>
> Looking forward for some help.
>
> *--*
>
> *Роза Айсина*
>
> Старший разработчик ПО
>
> *СберМаркет* | Доставка из любимых магазинов
>
>
>
> Email: roza.aysina@sbermarket.ru
>
> Mob:
>
> Web: sbermarket.ru
>
> App: iOS
> <https://apps.apple.com/ru/app/%D1%81%D0%B1%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82-%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D0%BE%D0%B2/id1166642457>
> и Android
> <https://play.google.com/store/apps/details?id=ru.instamart&hl=en&gl=ru>
>
>
>
> *УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ:* это электронное сообщение и любые
> документы, приложенные к нему, содержат конфиденциальную информацию.
> Настоящим уведомляем Вас о том, что, если это сообщение не предназначено
> Вам, использование, копирование, распространение информации, содержащейся в
> настоящем сообщении, а также осуществление любых действий на основе этой
> информации, строго запрещено. Если Вы получили это сообщение по ошибке,
> пожалуйста, сообщите об этом отправителю по электронной почте и удалите это
> сообщение.
> *CONFIDENTIALITY NOTICE:* This email and any files attached to it are
> confidential. If you are not the intended recipient you are notified that
> using, copying, distributing or taking any action in reliance on the
> contents of this information is strictly prohibited. If you have received
> this email in error please notify the sender and delete this email.
>