You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by "Wen Bo (Bill) Li" <we...@geotab.com> on 2022/07/13 18:24:19 UTC

Traditional RDBMS PARTITION concept on Ignite

Hi,

The traditional RDBMS has the concept of partitioning a table into
different chunks, but that isn't really partitioning data to different
nodes as described in the Ignite document. Our team is trying to partition
a table based on the values of one column and query data based on these
values. For example, there are 3 different values in our partitioned
column, A, B and C, and we want to get all data that belong to C and don't
want to read anything that belong to A and B.

We have a few ideas on doing this as indicated below:

   - Create separate tables for A, B and C
   - Use index for the partitioned column
   - Use affinity key for the partitioned column (this is more related to
   if the data are on the same node)

I am curious if the above 3 approaches are valid or if there is another way
to do this? Is it possible to do the ALTER command in the RDBMS to add
partitions? Thanks.

Regards,
Bill

Re: Traditional RDBMS PARTITION concept on Ignite

Posted by "Mengliao(Mike) Wang" <me...@geotab.com>.

Hi Denis and Stephen,

Thanks a lot for the great information and suggestions. I think we are
clear on the directions we are heading, but just one more question on the
calcite engine Stephen mentioned. From what I read this is replacing the
current H2-based engine starting from Ignite 2.13, and from this official
document <https://calcite.apache.org/docs/reference.html> from Calcite I
can see that the "PARTITION BY" keyword is supported by calcite. However I
don't see it on the Ignite DDL document
<https://ignite.apache.org/docs/latest/sql-reference/ddl>even for 2.13
which has the calcite engine. Does that mean not all the features from
calcite would be enabled/supported by Ignite, or if that is simply missing
from the documentation?

Thanks
Mike


On Wed, Jul 20, 2022 at 10:25 AM Denis Magda <dm...@apache.org> wrote:

> Hey folks, let me chime in to clarify the matters. What Wen Bo and
> Mengliao are really asking for is a combination of two features
> "partitioning + sharding".
>
> In the relational world (Postgres, MySQL, Oracle) partitioning is used to
> split a large table into smaller tables (called partitions). For example,
> imagine you have table PizzaOrders and you want the database to keep
> DELIVERED orders in one partition and all the others in a different one.
> You can easily achieve this by partitioning the primary PizzaOrders table
> by the order status into PizzaOrdersDelivered and PizzaOrdersOther. With
> those partitions in place, the SQL engine can apply the partition pruning
> optimization [1] and, as Megliao highlighted, it comes with data management
> benefits [2].
>
> This has nothing to do with the partitioning in Ignite. What Ignite does
> is sharding - distributing table data across a cluster of nodes.
> Distributed databases that support both partitioning and sharding do exist
> and usually, they are built on Postgres or MySQL (and have nothing to do
> with in-memory computing). For instance, YugabyteDB can partition your
> primary PizzaOrders table first into PizzaOrdersDelivered and
> PizzaOrdersOther, and then have those partitioned tables sharded
> automatically across the cluster.
>
> As long as Ignite doesn't have the partitioning feature of relational
> databases, you have these options:
>
>    1. Use affinity keys in Ignite as you would use partition keys in
>    Postgres/MySQL. But remember that all the data that matches an affinity key
>    will be stored together on a single Ignite node. It might be a capacity
>    problem if there are way too many records that belong to the affinity key.
>    2. Implement the Postgres/MySQL-like partitioning at the application
>    layer. Create Ignite tables for each logical partition, intercept user
>    queries and, depending on the value of a partitioning column, place a
>    record in one of the Ignite tables. Then Ignite will take care of the next
>    step - sharding.
>
>
> [1]
> https://dmagda.hashnode.dev/optimizing-application-queries-with-partition-pruning
> [2]
> https://dmagda.hashnode.dev/managing-data-placement-with-table-partitioning
>
> --
> Denis
>
> On Wed, Jul 20, 2022 at 6:35 AM Stephen Darlington <
> stephen.darlington@gridgain.com> wrote:
>
>> Ignite’s SQL is ANSI 99 compliant. Windowing functions such as PARTITION
>> BY came in SQL 2003 (and later). It’s possible that the new Calcite engine (
>> sql-calcite <https://ignite.apache.org/docs/latest/SQL/sql-calcite>)
>> supports the keywords, but I have not checked.
>>
>>
>>    - While querying we can only scan a small portion of the data to
>>    improve performance
>>
>> As you suggested: indexes.
>>
>>
>>    - Quickly and safely manage data in one partition in particular. For
>>    example, in some RDBMS you can build index or compress data for only one
>>    partition, or delete one partition without locking other partitions being
>>    updated
>>
>> In general, traditional databases increase performance by grouping
>> related stuff together. Ignite increases performance by distributing the
>> data across multiple machines, which allows tasks to be parallelised. A
>> different architecture results in different solutions.
>>
>>
>>    - Partitioning on multiple columns
>>
>> That’s an affinity key. But as I noted previously, you don’t want to do
>> that if you only have three distinct values.
>>
>> Regards,
>> Stephen
>>
>> On 18 Jul 2022, at 16:53, Mengliao(Mike) Wang <me...@geotab.com>
>> wrote:
>>
>> Hi Stephen,
>>
>> What we are looking for is the table partition with SQL in particular,
>> instead of the data partition people mostly refer to in Ignite which is
>> more from the infrastructure perspective. A.k.a the "PARTITION BY" keyword
>> in traditional RDBMS. In the Ignite official document (
>> https://ignite.apache.org/docs/latest/SQL/schemas) we didn't see
>> anything like that, so not sure if there is anything in Ignite that could
>> achieve these:
>>
>>    - While querying we can only scan a small portion of the data to
>>    improve performance
>>    - Quickly and safely manage data in one partition in particular. For
>>    example, in some RDBMS you can build index or compress data for only one
>>    partition, or delete one partition without locking other partitions being
>>    updated
>>    - Partitioning on multiple columns
>>
>>
>> Thanks
>> Mike
>>
>> On Thu, Jul 14, 2022 at 9:05 AM Stephen Darlington <
>> stephen.darlington@gridgain.com> wrote:
>>
>>> As you say, partitions in Ignite are about the distribution of data. You
>>> can group together related data using affinity keys, but if you only have
>>> three distinct values that would be a really bad idea. You can’t change the
>>> number of partitions after a table has been created.
>>>
>>> Either of your other solutions would work but, to be honest, I’m not
>>> completely sure what problem you’re trying to solve.
>>>
>>> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <we...@geotab.com> wrote:
>>>
>>> Hi,
>>>
>>> The traditional RDBMS has the concept of partitioning a table into
>>> different chunks, but that isn't really partitioning data to different
>>> nodes as described in the Ignite document. Our team is trying to partition
>>> a table based on the values of one column and query data based on these
>>> values. For example, there are 3 different values in our partitioned
>>> column, A, B and C, and we want to get all data that belong to C and don't
>>> want to read anything that belong to A and B.
>>>
>>> We have a few ideas on doing this as indicated below:
>>>
>>>    - Create separate tables for A, B and C
>>>    - Use index for the partitioned column
>>>    - Use affinity key for the partitioned column (this is more related
>>>    to if the data are on the same node)
>>>
>>> I am curious if the above 3 approaches are valid or if there is another
>>> way to do this? Is it possible to do the ALTER command in the RDBMS to add
>>> partitions? Thanks.
>>>
>>> Regards,
>>> Bill
>>>
>>>
>>>
>>

Re: Traditional RDBMS PARTITION concept on Ignite

Posted by Denis Magda <dm...@apache.org>.

Hey folks, let me chime in to clarify the matters. What Wen Bo and Mengliao
are really asking for is a combination of two features "partitioning +
sharding".

In the relational world (Postgres, MySQL, Oracle) partitioning is used to
split a large table into smaller tables (called partitions). For example,
imagine you have table PizzaOrders and you want the database to keep
DELIVERED orders in one partition and all the others in a different one.
You can easily achieve this by partitioning the primary PizzaOrders table
by the order status into PizzaOrdersDelivered and PizzaOrdersOther. With
those partitions in place, the SQL engine can apply the partition pruning
optimization [1] and, as Megliao highlighted, it comes with data management
benefits [2].

This has nothing to do with the partitioning in Ignite. What Ignite does is
sharding - distributing table data across a cluster of nodes. Distributed
databases that support both partitioning and sharding do exist and usually,
they are built on Postgres or MySQL (and have nothing to do with in-memory
computing). For instance, YugabyteDB can partition your primary PizzaOrders
table first into PizzaOrdersDelivered and PizzaOrdersOther, and then have
those partitioned tables sharded automatically across the cluster.

As long as Ignite doesn't have the partitioning feature of relational
databases, you have these options:

   1. Use affinity keys in Ignite as you would use partition keys in
   Postgres/MySQL. But remember that all the data that matches an affinity key
   will be stored together on a single Ignite node. It might be a capacity
   problem if there are way too many records that belong to the affinity key.
   2. Implement the Postgres/MySQL-like partitioning at the application
   layer. Create Ignite tables for each logical partition, intercept user
   queries and, depending on the value of a partitioning column, place a
   record in one of the Ignite tables. Then Ignite will take care of the next
   step - sharding.

[1]
https://dmagda.hashnode.dev/optimizing-application-queries-with-partition-pruning
[2]
https://dmagda.hashnode.dev/managing-data-placement-with-table-partitioning

--
Denis

On Wed, Jul 20, 2022 at 6:35 AM Stephen Darlington <
stephen.darlington@gridgain.com> wrote:

> Ignite’s SQL is ANSI 99 compliant. Windowing functions such as PARTITION
> BY came in SQL 2003 (and later). It’s possible that the new Calcite engine (
> sql-calcite <https://ignite.apache.org/docs/latest/SQL/sql-calcite>)
> supports the keywords, but I have not checked.
>
>
>    - While querying we can only scan a small portion of the data to
>    improve performance
>
> As you suggested: indexes.
>
>
>    - Quickly and safely manage data in one partition in particular. For
>    example, in some RDBMS you can build index or compress data for only one
>    partition, or delete one partition without locking other partitions being
>    updated
>
> In general, traditional databases increase performance by grouping related
> stuff together. Ignite increases performance by distributing the data
> across multiple machines, which allows tasks to be parallelised. A
> different architecture results in different solutions.
>
>
>    - Partitioning on multiple columns
>
> That’s an affinity key. But as I noted previously, you don’t want to do
> that if you only have three distinct values.
>
> Regards,
> Stephen
>
> On 18 Jul 2022, at 16:53, Mengliao(Mike) Wang <me...@geotab.com>
> wrote:
>
> Hi Stephen,
>
> What we are looking for is the table partition with SQL in particular,
> instead of the data partition people mostly refer to in Ignite which is
> more from the infrastructure perspective. A.k.a the "PARTITION BY" keyword
> in traditional RDBMS. In the Ignite official document (
> https://ignite.apache.org/docs/latest/SQL/schemas) we didn't see anything
> like that, so not sure if there is anything in Ignite that could achieve
> these:
>
>    - While querying we can only scan a small portion of the data to
>    improve performance
>    - Quickly and safely manage data in one partition in particular. For
>    example, in some RDBMS you can build index or compress data for only one
>    partition, or delete one partition without locking other partitions being
>    updated
>    - Partitioning on multiple columns
>
>
> Thanks
> Mike
>
> On Thu, Jul 14, 2022 at 9:05 AM Stephen Darlington <
> stephen.darlington@gridgain.com> wrote:
>
>> As you say, partitions in Ignite are about the distribution of data. You
>> can group together related data using affinity keys, but if you only have
>> three distinct values that would be a really bad idea. You can’t change the
>> number of partitions after a table has been created.
>>
>> Either of your other solutions would work but, to be honest, I’m not
>> completely sure what problem you’re trying to solve.
>>
>> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <we...@geotab.com> wrote:
>>
>> Hi,
>>
>> The traditional RDBMS has the concept of partitioning a table into
>> different chunks, but that isn't really partitioning data to different
>> nodes as described in the Ignite document. Our team is trying to partition
>> a table based on the values of one column and query data based on these
>> values. For example, there are 3 different values in our partitioned
>> column, A, B and C, and we want to get all data that belong to C and don't
>> want to read anything that belong to A and B.
>>
>> We have a few ideas on doing this as indicated below:
>>
>>    - Create separate tables for A, B and C
>>    - Use index for the partitioned column
>>    - Use affinity key for the partitioned column (this is more related
>>    to if the data are on the same node)
>>
>> I am curious if the above 3 approaches are valid or if there is another
>> way to do this? Is it possible to do the ALTER command in the RDBMS to add
>> partitions? Thanks.
>>
>> Regards,
>> Bill
>>
>>
>>
>

Re: Traditional RDBMS PARTITION concept on Ignite

Posted by Stephen Darlington <st...@gridgain.com>.

Ignite’s SQL is ANSI 99 compliant. Windowing functions such as PARTITION BY came in SQL 2003 (and later). It’s possible that the new Calcite engine (sql-calcite <https://ignite.apache.org/docs/latest/SQL/sql-calcite>) supports the keywords, but I have not checked.
> While querying we can only scan a small portion of the data to improve performance
As you suggested: indexes.
> Quickly and safely manage data in one partition in particular. For example, in some RDBMS you can build index or compress data for only one partition, or delete one partition without locking other partitions being updated
In general, traditional databases increase performance by grouping related stuff together. Ignite increases performance by distributing the data across multiple machines, which allows tasks to be parallelised. A different architecture results in different solutions.
> Partitioning on multiple columns
That’s an affinity key. But as I noted previously, you don’t want to do that if you only have three distinct values.

Regards,
Stephen

> On 18 Jul 2022, at 16:53, Mengliao(Mike) Wang <me...@geotab.com> wrote:
> 
> Hi Stephen,
> 
> What we are looking for is the table partition with SQL in particular, instead of the data partition people mostly refer to in Ignite which is more from the infrastructure perspective. A.k.a the "PARTITION BY" keyword in traditional RDBMS. In the Ignite official document (https://ignite.apache.org/docs/latest/SQL/schemas <https://ignite.apache.org/docs/latest/SQL/schemas>) we didn't see anything like that, so not sure if there is anything in Ignite that could achieve these:
> While querying we can only scan a small portion of the data to improve performance
> Quickly and safely manage data in one partition in particular. For example, in some RDBMS you can build index or compress data for only one partition, or delete one partition without locking other partitions being updated
> Partitioning on multiple columns
> 
> Thanks
> Mike
> 
> On Thu, Jul 14, 2022 at 9:05 AM Stephen Darlington <stephen.darlington@gridgain.com <ma...@gridgain.com>> wrote:
> As you say, partitions in Ignite are about the distribution of data. You can group together related data using affinity keys, but if you only have three distinct values that would be a really bad idea. You can’t change the number of partitions after a table has been created.
> 
> Either of your other solutions would work but, to be honest, I’m not completely sure what problem you’re trying to solve.
> 
>> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <wenboli@geotab.com <ma...@geotab.com>> wrote:
>> 
>> Hi,
>> 
>> The traditional RDBMS has the concept of partitioning a table into different chunks, but that isn't really partitioning data to different nodes as described in the Ignite document. Our team is trying to partition a table based on the values of one column and query data based on these values. For example, there are 3 different values in our partitioned column, A, B and C, and we want to get all data that belong to C and don't want to read anything that belong to A and B. 
>> 
>> We have a few ideas on doing this as indicated below:
>> Create separate tables for A, B and C
>> Use index for the partitioned column
>> Use affinity key for the partitioned column (this is more related to if the data are on the same node)
>> I am curious if the above 3 approaches are valid or if there is another way to do this? Is it possible to do the ALTER command in the RDBMS to add partitions? Thanks.
>> 
>> Regards,
>> Bill
>

Re: Traditional RDBMS PARTITION concept on Ignite

Posted by "Mengliao(Mike) Wang" <me...@geotab.com>.

Hi Stephen,

What we are looking for is the table partition with SQL in particular,
instead of the data partition people mostly refer to in Ignite which is
more from the infrastructure perspective. A.k.a the "PARTITION BY" keyword
in traditional RDBMS. In the Ignite official document (
https://ignite.apache.org/docs/latest/SQL/schemas) we didn't see anything
like that, so not sure if there is anything in Ignite that could achieve
these:

   - While querying we can only scan a small portion of the data to improve
   performance
   - Quickly and safely manage data in one partition in particular. For
   example, in some RDBMS you can build index or compress data for only one
   partition, or delete one partition without locking other partitions being
   updated
   - Partitioning on multiple columns


Thanks
Mike

On Thu, Jul 14, 2022 at 9:05 AM Stephen Darlington <
stephen.darlington@gridgain.com> wrote:

> As you say, partitions in Ignite are about the distribution of data. You
> can group together related data using affinity keys, but if you only have
> three distinct values that would be a really bad idea. You can’t change the
> number of partitions after a table has been created.
>
> Either of your other solutions would work but, to be honest, I’m not
> completely sure what problem you’re trying to solve.
>
> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <we...@geotab.com> wrote:
>
> Hi,
>
> The traditional RDBMS has the concept of partitioning a table into
> different chunks, but that isn't really partitioning data to different
> nodes as described in the Ignite document. Our team is trying to partition
> a table based on the values of one column and query data based on these
> values. For example, there are 3 different values in our partitioned
> column, A, B and C, and we want to get all data that belong to C and don't
> want to read anything that belong to A and B.
>
> We have a few ideas on doing this as indicated below:
>
>    - Create separate tables for A, B and C
>    - Use index for the partitioned column
>    - Use affinity key for the partitioned column (this is more related to
>    if the data are on the same node)
>
> I am curious if the above 3 approaches are valid or if there is another
> way to do this? Is it possible to do the ALTER command in the RDBMS to add
> partitions? Thanks.
>
> Regards,
> Bill
>
>
>

Re: Traditional RDBMS PARTITION concept on Ignite

Posted by Stephen Darlington <st...@gridgain.com>.

As you say, partitions in Ignite are about the distribution of data. You can group together related data using affinity keys, but if you only have three distinct values that would be a really bad idea. You can’t change the number of partitions after a table has been created.

Either of your other solutions would work but, to be honest, I’m not completely sure what problem you’re trying to solve.

> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <we...@geotab.com> wrote:
> 
> Hi,
> 
> The traditional RDBMS has the concept of partitioning a table into different chunks, but that isn't really partitioning data to different nodes as described in the Ignite document. Our team is trying to partition a table based on the values of one column and query data based on these values. For example, there are 3 different values in our partitioned column, A, B and C, and we want to get all data that belong to C and don't want to read anything that belong to A and B. 
> 
> We have a few ideas on doing this as indicated below:
> Create separate tables for A, B and C
> Use index for the partitioned column
> Use affinity key for the partitioned column (this is more related to if the data are on the same node)
> I am curious if the above 3 approaches are valid or if there is another way to do this? Is it possible to do the ALTER command in the RDBMS to add partitions? Thanks.
> 
> Regards,
> Bill