You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Courtney Robinson <co...@crlog.info> on 2022/02/11 17:08:38 UTC

Ignite JOIN fails to return results using WHERE clause

I have a query like this:

SELECT
> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
> FROM T0 tbl
> INNER JOIN T1 col ON tbl.id = col.tableId
> *WHERE tbl.releaseId = ? AND tbl.name <http://tbl.name> = ?*
> LIMIT 100
>

This returns no results so after investigating, I ended up changing it to
the below

SELECT
> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
> FROM *(SELECT * FROM T0 t WHERE t.releaseId = ? AND t.name
> <http://t.name> = ?) *tbl
> INNER JOIN T1 col ON tbl.id = col.tableId
> LIMIT 100
>

 This returns the results expected.
Can anyone offer any insight into what is going wrong here?

The tables here look like this (I removed some columns from the tables and
the query to help make it easier on the eyes to parse):

CREATE TABLE IF NOT EXISTS T0
> (
>   id        LONG,
>   releaseId VARCHAR,
>   name      VARCHAR,
>   PRIMARY KEY (releaseId, id)
> ) WITH "template=hypi_tpl,affinity_key=releaseId";

CREATE INDEX IF NOT EXISTS VirtualTable_idx0 ON VirtualTable (releaseId,
> name);
>
> CREATE TABLE IF NOT EXISTS T1
> (
>   id      LONG,
>   t0Id LONG,
>   name    VARCHAR,
>   type    VARCHAR,
>   PRIMARY KEY (t0Id, id)
> ) WITH "template=hypi_tpl,affinity_key=t0Id";
>

Note here it is a single node locally (so I don't expect collocation issues
to be in play here) - in development so not in a production cluster yet.
Running Ignite 2.8.0

This is not the first time we've had something like this but it's the first
time I've been able to reproduce it myself and consistently.

Best,
Courtney

Re: Ignite JOIN fails to return results using WHERE clause

Posted by Maksim Timonin <ti...@apache.org>.
Hi Courtney,

> Is it the case that as long as the affinity key is in the join predicate
that it would be a colocated JOIN

This is true for cases when you have a predicate with equality by affinity
keys. If a join predicate includes affinity keys eq condition then it also
can have any other conditions.

In your case, you have 2 tables and affinity keys T0.releaseId, T1.t0Id.
Then the valid join predicate will be: "T0.releaseId = T1.t0Id and (...
anything you wish)"

> In other words, if both tables share the same affinity key is it still a
collocated join if there are other filters in the join predicate?

Yes.

You can check examples of valid and non-valid joins here:
https://github.com/timoninmaxim/ignite/blob/master/modules/indexing/src/test/java/org/apache/ignite/sqltests/CheckWarnJoinPartitionedTables.java

Since Apache Ignite 2.12 it writes a warning message for SQL queries with
non-colocated joins.


On Sat, Feb 12, 2022 at 10:04 AM Courtney Robinson <
courtney.robinson@crlog.info> wrote:

> Hi Maksim,
>
> Interesting, thanks for your reply.
> Okay, I misunderstood (I also thought being on a single node that it
> didn't matter).
>
> Is it the case that as long as the affinity key is in the join predicate
> that it would be a colocated JOIN (I'm concerned about the impact of
> setDistributedJoins(true))?
> Or is it the case that if you're joining on partitioned tables, you must
> do so with ONLY the affinity key in the join predicate?
>
> SELECT tbl.releaseId, tbl.name FROM T0 tbl
>
>  INNER JOIN T1 col ON tbl.releaseId = <What goes here?>
>
>
> In the previous tables, T1 does not have the releaseId as a column so does
> that mean it is impossible to do a co-located JOIN with this setup?
>
> If we modify T1 so that it also has releaseId and we make releaseId the
> affinity key of T1 will both of these work?
>
> SELECT tbl.releaseId, tbl.name FROM T0 tbl
>
>  INNER JOIN T1 col ON tbl.releaseId = col.releaseId
>
>
> AND
>
> SELECT tbl.releaseId, tbl.name FROM T0 tbl
>
>  INNER JOIN T1 col ON tbl.releaseId = col.releaseId AND col.tableId =
>> tbl.id AND col.x = y
>
>
> In other words, if both tables share the same affinity key is it still a
> collocated join if there are other filters in the join predicate?
>
> If the answer to this yes, does it matter if the filters in the join
> predicate are all = i.e. does it have to be an equi-join? or could the
> predicate be
>
>> ON tbl.releaseId = col.releaseId AND* col.tableId > tbl.id
>> <http://tbl.id/>* AND *col.x >= y*
>>
>
> Thanks
>
> On Fri, Feb 11, 2022 at 6:42 PM Maksim Timonin <ti...@apache.org>
> wrote:
>
>> Hi Courtney,
>>
>> > I don't expect collocation issues to be in play here
>>
>> Do you check this doc:
>> https://ignite.apache.org/docs/latest/SQL/distributed-joins ?
>>
>> It says: "A distributed join is a SQL statement with a join clause that
>> combines two or more partitioned tables. If the tables are joined on the
>> partitioning column (affinity key), the join is called a colocated join.
>> Otherwise, it is called a non-colocated join"
>>
>> You definitely have a collocation issue due to non-collocated join: T0
>> partitioned by "releaseId", T1 by "t0Id", and you make a join by columns
>> that aren't affinity columns (id = tableId).
>>
>> You should specify the flag "SqlFieldsQuery.setDistributedJoins(true)" to
>> make your join return correct results.
>>
>> Maksim
>>
>>
>> On Fri, Feb 11, 2022 at 8:09 PM Courtney Robinson <
>> courtney.robinson@crlog.info> wrote:
>>
>>>
>>> I have a query like this:
>>>
>>> SELECT
>>>> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
>>>> FROM T0 tbl
>>>> INNER JOIN T1 col ON tbl.id = col.tableId
>>>> *WHERE tbl.releaseId = ? AND tbl.name <http://tbl.name> = ?*
>>>> LIMIT 100
>>>>
>>>
>>> This returns no results so after investigating, I ended up changing it
>>> to the below
>>>
>>> SELECT
>>>> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
>>>> FROM *(SELECT * FROM T0 t WHERE t.releaseId = ? AND t.name
>>>> <http://t.name> = ?) *tbl
>>>> INNER JOIN T1 col ON tbl.id = col.tableId
>>>> LIMIT 100
>>>>
>>>
>>>  This returns the results expected.
>>> Can anyone offer any insight into what is going wrong here?
>>>
>>> The tables here look like this (I removed some columns from the tables
>>> and the query to help make it easier on the eyes to parse):
>>>
>>> CREATE TABLE IF NOT EXISTS T0
>>>> (
>>>>   id        LONG,
>>>>   releaseId VARCHAR,
>>>>   name      VARCHAR,
>>>>   PRIMARY KEY (releaseId, id)
>>>> ) WITH "template=hypi_tpl,affinity_key=releaseId";
>>>
>>> CREATE INDEX IF NOT EXISTS VirtualTable_idx0 ON VirtualTable (releaseId,
>>>> name);
>>>>
>>>> CREATE TABLE IF NOT EXISTS T1
>>>> (
>>>>   id      LONG,
>>>>   t0Id LONG,
>>>>   name    VARCHAR,
>>>>   type    VARCHAR,
>>>>   PRIMARY KEY (t0Id, id)
>>>> ) WITH "template=hypi_tpl,affinity_key=t0Id";
>>>>
>>>
>>> Note here it is a single node locally (so I don't expect collocation
>>> issues to be in play here) - in development so not in a production cluster
>>> yet.
>>> Running Ignite 2.8.0
>>>
>>> This is not the first time we've had something like this but it's the
>>> first time I've been able to reproduce it myself and consistently.
>>>
>>> Best,
>>> Courtney
>>>
>>

Re: Ignite JOIN fails to return results using WHERE clause

Posted by Courtney Robinson <co...@crlog.info>.
Hi Maksim,

Interesting, thanks for your reply.
Okay, I misunderstood (I also thought being on a single node that it didn't
matter).

Is it the case that as long as the affinity key is in the join predicate
that it would be a colocated JOIN (I'm concerned about the impact of
setDistributedJoins(true))?
Or is it the case that if you're joining on partitioned tables, you must do
so with ONLY the affinity key in the join predicate?

SELECT tbl.releaseId, tbl.name FROM T0 tbl

 INNER JOIN T1 col ON tbl.releaseId = <What goes here?>


In the previous tables, T1 does not have the releaseId as a column so does
that mean it is impossible to do a co-located JOIN with this setup?

If we modify T1 so that it also has releaseId and we make releaseId the
affinity key of T1 will both of these work?

SELECT tbl.releaseId, tbl.name FROM T0 tbl

 INNER JOIN T1 col ON tbl.releaseId = col.releaseId


AND

SELECT tbl.releaseId, tbl.name FROM T0 tbl

 INNER JOIN T1 col ON tbl.releaseId = col.releaseId AND col.tableId = tbl.id
> AND col.x = y


In other words, if both tables share the same affinity key is it still a
collocated join if there are other filters in the join predicate?

If the answer to this yes, does it matter if the filters in the join
predicate are all = i.e. does it have to be an equi-join? or could the
predicate be

> ON tbl.releaseId = col.releaseId AND* col.tableId > tbl.id
> <http://tbl.id/>* AND *col.x >= y*
>

Thanks

On Fri, Feb 11, 2022 at 6:42 PM Maksim Timonin <ti...@apache.org>
wrote:

> Hi Courtney,
>
> > I don't expect collocation issues to be in play here
>
> Do you check this doc:
> https://ignite.apache.org/docs/latest/SQL/distributed-joins ?
>
> It says: "A distributed join is a SQL statement with a join clause that
> combines two or more partitioned tables. If the tables are joined on the
> partitioning column (affinity key), the join is called a colocated join.
> Otherwise, it is called a non-colocated join"
>
> You definitely have a collocation issue due to non-collocated join: T0
> partitioned by "releaseId", T1 by "t0Id", and you make a join by columns
> that aren't affinity columns (id = tableId).
>
> You should specify the flag "SqlFieldsQuery.setDistributedJoins(true)" to
> make your join return correct results.
>
> Maksim
>
>
> On Fri, Feb 11, 2022 at 8:09 PM Courtney Robinson <
> courtney.robinson@crlog.info> wrote:
>
>>
>> I have a query like this:
>>
>> SELECT
>>> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
>>> FROM T0 tbl
>>> INNER JOIN T1 col ON tbl.id = col.tableId
>>> *WHERE tbl.releaseId = ? AND tbl.name <http://tbl.name> = ?*
>>> LIMIT 100
>>>
>>
>> This returns no results so after investigating, I ended up changing it to
>> the below
>>
>> SELECT
>>> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
>>> FROM *(SELECT * FROM T0 t WHERE t.releaseId = ? AND t.name
>>> <http://t.name> = ?) *tbl
>>> INNER JOIN T1 col ON tbl.id = col.tableId
>>> LIMIT 100
>>>
>>
>>  This returns the results expected.
>> Can anyone offer any insight into what is going wrong here?
>>
>> The tables here look like this (I removed some columns from the tables
>> and the query to help make it easier on the eyes to parse):
>>
>> CREATE TABLE IF NOT EXISTS T0
>>> (
>>>   id        LONG,
>>>   releaseId VARCHAR,
>>>   name      VARCHAR,
>>>   PRIMARY KEY (releaseId, id)
>>> ) WITH "template=hypi_tpl,affinity_key=releaseId";
>>
>> CREATE INDEX IF NOT EXISTS VirtualTable_idx0 ON VirtualTable (releaseId,
>>> name);
>>>
>>> CREATE TABLE IF NOT EXISTS T1
>>> (
>>>   id      LONG,
>>>   t0Id LONG,
>>>   name    VARCHAR,
>>>   type    VARCHAR,
>>>   PRIMARY KEY (t0Id, id)
>>> ) WITH "template=hypi_tpl,affinity_key=t0Id";
>>>
>>
>> Note here it is a single node locally (so I don't expect collocation
>> issues to be in play here) - in development so not in a production cluster
>> yet.
>> Running Ignite 2.8.0
>>
>> This is not the first time we've had something like this but it's the
>> first time I've been able to reproduce it myself and consistently.
>>
>> Best,
>> Courtney
>>
>

Re: Ignite JOIN fails to return results using WHERE clause

Posted by Maksim Timonin <ti...@apache.org>.
Hi Courtney,

> I don't expect collocation issues to be in play here

Do you check this doc:
https://ignite.apache.org/docs/latest/SQL/distributed-joins ?

It says: "A distributed join is a SQL statement with a join clause that
combines two or more partitioned tables. If the tables are joined on the
partitioning column (affinity key), the join is called a colocated join.
Otherwise, it is called a non-colocated join"

You definitely have a collocation issue due to non-collocated join: T0
partitioned by "releaseId", T1 by "t0Id", and you make a join by columns
that aren't affinity columns (id = tableId).

You should specify the flag "SqlFieldsQuery.setDistributedJoins(true)" to
make your join return correct results.

Maksim


On Fri, Feb 11, 2022 at 8:09 PM Courtney Robinson <
courtney.robinson@crlog.info> wrote:

>
> I have a query like this:
>
> SELECT
>> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
>> FROM T0 tbl
>> INNER JOIN T1 col ON tbl.id = col.tableId
>> *WHERE tbl.releaseId = ? AND tbl.name <http://tbl.name> = ?*
>> LIMIT 100
>>
>
> This returns no results so after investigating, I ended up changing it to
> the below
>
> SELECT
>> tbl.id AS tbl_id, tbl.releaseId AS tbl_releaseId, col.type AS col_type
>> FROM *(SELECT * FROM T0 t WHERE t.releaseId = ? AND t.name
>> <http://t.name> = ?) *tbl
>> INNER JOIN T1 col ON tbl.id = col.tableId
>> LIMIT 100
>>
>
>  This returns the results expected.
> Can anyone offer any insight into what is going wrong here?
>
> The tables here look like this (I removed some columns from the tables and
> the query to help make it easier on the eyes to parse):
>
> CREATE TABLE IF NOT EXISTS T0
>> (
>>   id        LONG,
>>   releaseId VARCHAR,
>>   name      VARCHAR,
>>   PRIMARY KEY (releaseId, id)
>> ) WITH "template=hypi_tpl,affinity_key=releaseId";
>
> CREATE INDEX IF NOT EXISTS VirtualTable_idx0 ON VirtualTable (releaseId,
>> name);
>>
>> CREATE TABLE IF NOT EXISTS T1
>> (
>>   id      LONG,
>>   t0Id LONG,
>>   name    VARCHAR,
>>   type    VARCHAR,
>>   PRIMARY KEY (t0Id, id)
>> ) WITH "template=hypi_tpl,affinity_key=t0Id";
>>
>
> Note here it is a single node locally (so I don't expect collocation
> issues to be in play here) - in development so not in a production cluster
> yet.
> Running Ignite 2.8.0
>
> This is not the first time we've had something like this but it's the
> first time I've been able to reproduce it myself and consistently.
>
> Best,
> Courtney
>