You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Sanjeev <gu...@gmail.com> on 2018/06/06 17:55:41 UTC

Re: How are SQL Queries executed on Ignite Cluster

So it looks like that Query parallelism works at Cache level, but it would
make more sense to do it at the Query level, more like a hint in a SQL query
to control how much parallelism is needed. This way it will be very dynamic
and users would have full control. Default could be 1, but OLAP queries
could define it dynamically on the fly on a per query basis.

Also, is CacheConfiguration.queryParallelism() call the only way to define
it. Is there no way to define this, after the fact that cache has been
created and loaded. There should be a way to turn this on-off, for now at
Cache level (though would prefer at Query level), through SQL DDL or DML
statements.

Is this possible now in some way, or do we have to bring the cluster down
and reload all the data?

Thanks...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: How are SQL Queries executed on Ignite Cluster

Posted by Andrey Mashenkov <an...@gmail.com>.

1. It is not supported for now, but we plan do fix it [1]
2. Not yet. We split index different way. Every tree can manage certain
partition numbers. Feel the difference.
Mapping "partition -> tree segment" is static and we use as reminder of the
division, like partition_id%N.

Query works same way like if we'd have N times more nodes.

The issue here is we should "broadcast" query to all (nodes * N) segments
for even quite simple queries
if it is impossible to calculate query affinity.

Assume, you want to query for 1000 rows from node with parallelizm=32.
(Select * from T order by t.c1 limit 1000)
Actually, this query will retrieve 1k row per-index segment, it is 32k row
per node, and then 31k row will be just filtered out....

[1] https://issues.apache.org/jira/browse/IGNITE-6089

On Fri, Jun 8, 2018 at 4:07 AM, Sanjeev <gu...@gmail.com> wrote:

> trying to understand this:
>
> 1) In case where no indexes are involved and you are doing a table scan, it
> should automatically try to exploit available CPU cores and process each
> partition on a separate thread/core. At least table scan queries should
> entertain the idea dynamic parallelism through DML hints.
>
> 2) In case of indexes, what you are saying is that N trees are build for M
> primary partitions on a node, where N being degree of parallelism. So each
> tree is managing a certain number of partitions, M/N. As number of
> partition
> on a nodes increase or decrease, the N trees are adjusted to reflect that.
>
> What I am wondering is in case if indexes were created, then could we
> always
> create N trees. What are the performance implications of:
> 1) A single thread working on 1 large single index
> 2) A single thread working on 1 or few of the N small indexes based on the
> query.
> 3) N cores working on N small indexes in parallel.
>
> 3 should always perform well. Between 1 and 2, would one perform better or
> worse.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

-- 
Best regards,
Andrey V. Mashenkov

Re: How are SQL Queries executed on Ignite Cluster

Posted by Sanjeev <gu...@gmail.com>.

trying to understand this:

1) In case where no indexes are involved and you are doing a table scan, it
should automatically try to exploit available CPU cores and process each
partition on a separate thread/core. At least table scan queries should
entertain the idea dynamic parallelism through DML hints.

2) In case of indexes, what you are saying is that N trees are build for M
primary partitions on a node, where N being degree of parallelism. So each
tree is managing a certain number of partitions, M/N. As number of partition
on a nodes increase or decrease, the N trees are adjusted to reflect that. 

What I am wondering is in case if indexes were created, then could we always
create N trees. What are the performance implications of:
1) A single thread working on 1 large single index
2) A single thread working on 1 or few of the N small indexes based on the
query.
3) N cores working on N small indexes in parallel. 

3 should always perform well. Between 1 and 2, would one perform better or
worse.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: How are SQL Queries executed on Ignite Cluster

Posted by Evgenii Zhuravlev <e....@gmail.com>.

It can't work on the query level, because internally, it divides all
indexes into N trees instead of one(where N equals queryParallelism). You
can't redefine it after it was created since it will lead to the complete
rebuild of all indexes.

Evgenii

2018-06-06 20:55 GMT+03:00 Sanjeev <gu...@gmail.com>:

> So it looks like that Query parallelism works at Cache level, but it would
> make more sense to do it at the Query level, more like a hint in a SQL
> query
> to control how much parallelism is needed. This way it will be very dynamic
> and users would have full control. Default could be 1, but OLAP queries
> could define it dynamically on the fly on a per query basis.
>
> Also, is CacheConfiguration.queryParallelism() call the only way to define
> it. Is there no way to define this, after the fact that cache has been
> created and loaded. There should be a way to turn this on-off, for now at
> Cache level (though would prefer at Query level), through SQL DDL or DML
> statements.
>
> Is this possible now in some way, or do we have to bring the cluster down
> and reload all the data?
>
> Thanks...
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>