You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by joseheitor <jo...@heitorprojects.com> on 2018/11/27 09:06:31 UTC

SQL Query plan confusion

1. - I have a nested join query on a table of 8,000,000 records which
performs similar or better than PostreSQL (~10ms) on my small test setup (2x
nodes, 8GB, 2CPU):

SELECT
	mainTable.pk, mainTable.id, mainTable.k, mainTable.v
FROM
	public.test_data AS mainTable
		INNER JOIN (
			SELECT
				lastName.id
			FROM
				(SELECT id FROM public.test_data WHERE k = 'trans.cust.last_name' AND v
= 'Smythe-Veall') AS lastName
					INNER JOIN
						(SELECT id FROM public.test_data WHERE k = 'trans.date' AND v =
'2017-12-21') AS transDate ON transDate.id = lastName.id
					INNER JOIN
						(SELECT id FROM public.test_data WHERE k = 'trans.amount' AND cast(v
AS integer) > 90000) AS transAmount ON transAmount.id = lastName.id
		) AS subTable ON mainTable.id = subTable.id 
ORDER BY 1, 2


2. - By simply adding a WHERE clause at the end, the performance becomes
catastrophic on Ignite (~10s for subsequent queries - first query takes many
minutes). On PostgreSQL performance does not change...

SELECT
	mainTable.pk, mainTable.id, mainTable.k, mainTable.v
FROM
	public.test_data AS mainTable
		INNER JOIN (
			SELECT
				lastName.id
			FROM
				(SELECT id FROM public.test_data WHERE k = 'trans.cust.last_name' AND v
= 'Smythe-Veall') AS lastName
					INNER JOIN
						(SELECT id FROM public.test_data WHERE k = 'trans.date' AND v =
'2017-12-21') AS transDate ON transDate.id = lastName.id
					INNER JOIN
						(SELECT id FROM public.test_data WHERE k = 'trans.amount' AND cast(v
AS integer) > 90000) AS transAmount ON transAmount.id = lastName.id
		) AS subTable ON mainTable.id = subTable.id 
*WHERE
	mainTable.k = 'trans.cust.first_name'*
ORDER BY 1, 2

What can I do to optimise this query for Ignite???

(Table structure and query plans attached for reference)

Thanks,
Jose
table.sql
<http://apache-ignite-users.70518.x6.nabble.com/file/t1652/table.sql>  
good-join-query.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t1652/good-join-query.txt>  
bad-join-query.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t1652/bad-join-query.txt>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: SQL WHERE Query Slow

Posted by joseheitor <jo...@heitorprojects.com>.
Can anyone help with this?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: [RESOLVED] SQL WHERE Query Slow

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

You can also try and set 'enforceJoinOrder' property to true, to force
Ignite to join tables in the same order as they are specified in FROM
clause.

Regards,
-- 
Ilya Kasnacheev


пн, 10 дек. 2018 г. в 06:54, joseheitor <jo...@heitorprojects.com>:

> Found  the reason for the problem - and a solution...
>
> Problem is that the query-optimiser performs the final WHERE clause
> *first*,
> which results in a full table scan.
>
> The solution is to structure the query so that the original query (without
> the WHERE clause) is forced to be performed first, then finally the WHERE
> clause can be run on a small set of data. This can be achieved by setting
> the whole original query (without the WHERE clause) as a 'Nested
> Sub-query'...
>
> SELECT subQuery.*
> FROM (
>         *SELECT
>                 mainTable.pk, mainTable.id, mainTable.k, mainTable.v
>         FROM
>                 public.transactions AS mainTable
>         INNER JOIN
>       (SELECT id FROM public.transactions WHERE k = 'trans.cust.last_name'
> AND v = 'Smythe-Veall') AS lastName ON lastName.id = mainTable.id
>     INNER JOIN
>       (SELECT id FROM public.transactions WHERE k = 'trans.date' AND v =
> '2017-12-21') AS transDate ON transDate.id = mainTable.id
>     INNER JOIN
>       (SELECT id FROM public.transactions WHERE k = 'trans.amount' AND
> cast(v AS integer) > 90000) AS transAmount ON transAmount.id =
> mainTable.id*
> ) AS subQuery
> WHERE subQuery.k = 'trans.cust.first_name'
> ORDER BY 1
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

[RESOLVED] SQL WHERE Query Slow

Posted by joseheitor <jo...@heitorprojects.com>.
Found  the reason for the problem - and a solution...

Problem is that the query-optimiser performs the final WHERE clause *first*,
which results in a full table scan.

The solution is to structure the query so that the original query (without
the WHERE clause) is forced to be performed first, then finally the WHERE
clause can be run on a small set of data. This can be achieved by setting
the whole original query (without the WHERE clause) as a 'Nested
Sub-query'...

SELECT subQuery.*
FROM (
	*SELECT
		mainTable.pk, mainTable.id, mainTable.k, mainTable.v
	FROM
		public.transactions AS mainTable
	INNER JOIN
      (SELECT id FROM public.transactions WHERE k = 'trans.cust.last_name'
AND v = 'Smythe-Veall') AS lastName ON lastName.id = mainTable.id
    INNER JOIN
      (SELECT id FROM public.transactions WHERE k = 'trans.date' AND v =
'2017-12-21') AS transDate ON transDate.id = mainTable.id
    INNER JOIN
      (SELECT id FROM public.transactions WHERE k = 'trans.amount' AND
cast(v AS integer) > 90000) AS transAmount ON transAmount.id = mainTable.id*
) AS subQuery
WHERE subQuery.k = 'trans.cust.first_name'
ORDER BY 1



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: SQL Query plan confusion

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

The difference in query plans is a 'bad' one use  PUBLIC.TEST_DATA_K_V_ID
index instead of PUBLIC.TEST_DATA_ID_K_V.
Have you tries to remove other indices and left TEST_DATA_ID_K_V only?

It looks weird for me, the 'good' query plan shows unicast requests as ID
is affinity field, but 'bad' says nothing about broadcast.
Are you run query with same flags? Do you have setDistributedJoins=true?

Another possible reason: TEST_DATA_K_V_ID can't be used efficiently as
default inline size is 10, that means only 7 first chars of composite index
will be inlined. You have at least 2 fields with 'trans.c' prefix, so
Ignite may scan a huge number of rows with using this index.
Try to remove it of use Hints to force Ignite use 'efficient' one.

In case you are really need to have 'k' as a first column in composite
index, you can try to increase inline size.
FYI: For now, fixed length columns will use 1 addition byte to inline (for
a column type prefix) and var-len column requires 3 additional bytes for
inline (1 byte for type, 2 bytes for size)


On Fri, Dec 7, 2018 at 11:19 AM Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Hi Jose,
>
> I am a bit lost in two provided explains. Both "godd" and "bad" contain "k
> = 'trans.cust.first_name'" condition. Could you please confirm that they
> are correct? Specifically, I cannot understand why this condition is
> present in "good" explain.
>
> On Tue, Nov 27, 2018 at 12:06 PM joseheitor <jo...@heitorprojects.com>
> wrote:
>
>> 1. - I have a nested join query on a table of 8,000,000 records which
>> performs similar or better than PostreSQL (~10ms) on my small test setup
>> (2x
>> nodes, 8GB, 2CPU):
>>
>> SELECT
>>         mainTable.pk, mainTable.id, mainTable.k, mainTable.v
>> FROM
>>         public.test_data AS mainTable
>>                 INNER JOIN (
>>                         SELECT
>>                                 lastName.id
>>                         FROM
>>                                 (SELECT id FROM public.test_data WHERE k
>> = 'trans.cust.last_name' AND v
>> = 'Smythe-Veall') AS lastName
>>                                         INNER JOIN
>>                                                 (SELECT id FROM
>> public.test_data WHERE k = 'trans.date' AND v =
>> '2017-12-21') AS transDate ON transDate.id = lastName.id
>>                                         INNER JOIN
>>                                                 (SELECT id FROM
>> public.test_data WHERE k = 'trans.amount' AND cast(v
>> AS integer) > 90000) AS transAmount ON transAmount.id = lastName.id
>>                 ) AS subTable ON mainTable.id = subTable.id
>> ORDER BY 1, 2
>>
>>
>> 2. - By simply adding a WHERE clause at the end, the performance becomes
>> catastrophic on Ignite (~10s for subsequent queries - first query takes
>> many
>> minutes). On PostgreSQL performance does not change...
>>
>> SELECT
>>         mainTable.pk, mainTable.id, mainTable.k, mainTable.v
>> FROM
>>         public.test_data AS mainTable
>>                 INNER JOIN (
>>                         SELECT
>>                                 lastName.id
>>                         FROM
>>                                 (SELECT id FROM public.test_data WHERE k
>> = 'trans.cust.last_name' AND v
>> = 'Smythe-Veall') AS lastName
>>                                         INNER JOIN
>>                                                 (SELECT id FROM
>> public.test_data WHERE k = 'trans.date' AND v =
>> '2017-12-21') AS transDate ON transDate.id = lastName.id
>>                                         INNER JOIN
>>                                                 (SELECT id FROM
>> public.test_data WHERE k = 'trans.amount' AND cast(v
>> AS integer) > 90000) AS transAmount ON transAmount.id = lastName.id
>>                 ) AS subTable ON mainTable.id = subTable.id
>> *WHERE
>>         mainTable.k = 'trans.cust.first_name'*
>> ORDER BY 1, 2
>>
>> What can I do to optimise this query for Ignite???
>>
>> (Table structure and query plans attached for reference)
>>
>> Thanks,
>> Jose
>> table.sql
>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1652/table.sql>
>> good-join-query.txt
>> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t1652/good-join-query.txt>
>>
>> bad-join-query.txt
>> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t1652/bad-join-query.txt>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

-- 
Best regards,
Andrey V. Mashenkov

Re: SQL Query plan confusion

Posted by joseheitor <jo...@heitorprojects.com>.
My apologies, Vladimir.

The query-plans are swapped around (oops!).

The query-plan for the efficient query is in the 'bad' file. And
vice-versa...

Thanks,
Jose



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: SQL Query plan confusion

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Hi Jose,

I am a bit lost in two provided explains. Both "godd" and "bad" contain "k
= 'trans.cust.first_name'" condition. Could you please confirm that they
are correct? Specifically, I cannot understand why this condition is
present in "good" explain.

On Tue, Nov 27, 2018 at 12:06 PM joseheitor <jo...@heitorprojects.com> wrote:

> 1. - I have a nested join query on a table of 8,000,000 records which
> performs similar or better than PostreSQL (~10ms) on my small test setup
> (2x
> nodes, 8GB, 2CPU):
>
> SELECT
>         mainTable.pk, mainTable.id, mainTable.k, mainTable.v
> FROM
>         public.test_data AS mainTable
>                 INNER JOIN (
>                         SELECT
>                                 lastName.id
>                         FROM
>                                 (SELECT id FROM public.test_data WHERE k =
> 'trans.cust.last_name' AND v
> = 'Smythe-Veall') AS lastName
>                                         INNER JOIN
>                                                 (SELECT id FROM
> public.test_data WHERE k = 'trans.date' AND v =
> '2017-12-21') AS transDate ON transDate.id = lastName.id
>                                         INNER JOIN
>                                                 (SELECT id FROM
> public.test_data WHERE k = 'trans.amount' AND cast(v
> AS integer) > 90000) AS transAmount ON transAmount.id = lastName.id
>                 ) AS subTable ON mainTable.id = subTable.id
> ORDER BY 1, 2
>
>
> 2. - By simply adding a WHERE clause at the end, the performance becomes
> catastrophic on Ignite (~10s for subsequent queries - first query takes
> many
> minutes). On PostgreSQL performance does not change...
>
> SELECT
>         mainTable.pk, mainTable.id, mainTable.k, mainTable.v
> FROM
>         public.test_data AS mainTable
>                 INNER JOIN (
>                         SELECT
>                                 lastName.id
>                         FROM
>                                 (SELECT id FROM public.test_data WHERE k =
> 'trans.cust.last_name' AND v
> = 'Smythe-Veall') AS lastName
>                                         INNER JOIN
>                                                 (SELECT id FROM
> public.test_data WHERE k = 'trans.date' AND v =
> '2017-12-21') AS transDate ON transDate.id = lastName.id
>                                         INNER JOIN
>                                                 (SELECT id FROM
> public.test_data WHERE k = 'trans.amount' AND cast(v
> AS integer) > 90000) AS transAmount ON transAmount.id = lastName.id
>                 ) AS subTable ON mainTable.id = subTable.id
> *WHERE
>         mainTable.k = 'trans.cust.first_name'*
> ORDER BY 1, 2
>
> What can I do to optimise this query for Ignite???
>
> (Table structure and query plans attached for reference)
>
> Thanks,
> Jose
> table.sql
> <http://apache-ignite-users.70518.x6.nabble.com/file/t1652/table.sql>
> good-join-query.txt
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t1652/good-join-query.txt>
>
> bad-join-query.txt
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t1652/bad-join-query.txt>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>