You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by blackfield <ch...@maxpoint.com> on 2017/10/16 16:45:39 UTC

Query performance against table with/out backup

Hello,

We have a table with the following configuration:
1. Persistence is enabled
2. Partition (not replicated)
3. Backup = 1 vs. 0
Everything else, pretty much use default.

We have a table in which we perform the following query:
SELECT COUNT(*) FROM Table WHERE column1 > 0.75 AND column2 > 0.75 AND zone
IN (....27000 zones...);

The table has about 75000 rows and zone is the primary key.

I ran the above query with many other options and many different client
threads (1-50), the backup == 1 consistently about twice as slow as when the
backup == 0.

The Ignite documentation mentions at many different places that specifying
backup impacts the performance. 

I understand if the write performance is impacted when backup is specified.

What I am trying to understand is why the read performance appears to be
heavily impacted when we specify the backup. 








--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Query performance against table with/out backup

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Andrey, thanks for pointing this out.

I think this is a serious usability issue. Do I understand correctly that
we currently simply ignore a change in backup count if the persistence is
enabled? If yes, we should fix it ASAP. I would also add a check for other
configuration properties, like the affinity function, cache name, etc...

D.

On Fri, Oct 27, 2017 at 7:46 AM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Crossposting to dev.
>
> Igniters,
>
> For now we allow user to change number of backups in configuration between
> grid restarts with no error.
> An user see new configuration object from code, but see old in visor.
>
> So, if user rise number of backups between grid restarts, it will be able
> to start the grid successfully.
> Seems, then user see more backups in cacheConfiguration then really Ignite
> will have.
>
> I slightly rework our persistence examples to check this case and it looks
> to be a true.
> I see new configuration on client and on servers, but backups=2 can't
> prevent data loss in case of 2 nodes failed.
>
> Steps to reproduce (I waited for awhile at each step for rebalance had been
> finished).
> 1. Start grid with backups=1 and persistence enabled.
> 2. Fill with data and shutdown a grid.
> 3. Change config with backup=2.
> 4. Start grid and wait for awhile.
> 5. Kill 2 nodes and observe data loss.
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-6781
>
>
> ---------- Forwarded message ----------
> From: blackfield <ch...@maxpoint.com>
> Date: Thu, Oct 19, 2017 at 9:29 PM
> Subject: Re: Query performance against table with/out backup
> To: user@ignite.apache.org
>
>
> Here, I am trying to ascertain that I set backup == 2 properly as I
> mentioned
> above that I do not see query performance difference between backup ==1 and
> backup == 2.
>
> I want to make sure that I configure my cache properly.
>
> When I set the backup==2 (to have three copies), I notice the following via
> visor.
>
> The Affinity Backups is still equal to 1. Is this a different property than
> number of backups? If it is not, how do one see the number of backups a
> cache is configured for?
>
> Invoking "cache -a" to see the detail cache stat, with backup==2, under the
> size column, the sum of entries on all nodes is equal to the number of rows
> in the table * 2.  It appears this is the case for backup >= 1?
>
> As in, only one set of backup will be stored in off heap regardless the
> number of backups are specified?
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Fwd: Query performance against table with/out backup

Posted by Andrey Mashenkov <an...@gmail.com>.
Crossposting to dev.

Igniters,

For now we allow user to change number of backups in configuration between
grid restarts with no error.
An user see new configuration object from code, but see old in visor.

So, if user rise number of backups between grid restarts, it will be able
to start the grid successfully.
Seems, then user see more backups in cacheConfiguration then really Ignite
will have.

I slightly rework our persistence examples to check this case and it looks
to be a true.
I see new configuration on client and on servers, but backups=2 can't
prevent data loss in case of 2 nodes failed.

Steps to reproduce (I waited for awhile at each step for rebalance had been
finished).
1. Start grid with backups=1 and persistence enabled.
2. Fill with data and shutdown a grid.
3. Change config with backup=2.
4. Start grid and wait for awhile.
5. Kill 2 nodes and observe data loss.


[1] https://issues.apache.org/jira/browse/IGNITE-6781


---------- Forwarded message ----------
From: blackfield <ch...@maxpoint.com>
Date: Thu, Oct 19, 2017 at 9:29 PM
Subject: Re: Query performance against table with/out backup
To: user@ignite.apache.org


Here, I am trying to ascertain that I set backup == 2 properly as I
mentioned
above that I do not see query performance difference between backup ==1 and
backup == 2.

I want to make sure that I configure my cache properly.

When I set the backup==2 (to have three copies), I notice the following via
visor.

The Affinity Backups is still equal to 1. Is this a different property than
number of backups? If it is not, how do one see the number of backups a
cache is configured for?

Invoking "cache -a" to see the detail cache stat, with backup==2, under the
size column, the sum of entries on all nodes is equal to the number of rows
in the table * 2.  It appears this is the case for backup >= 1?

As in, only one set of backup will be stored in off heap regardless the
number of backups are specified?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Andrey V. Mashenkov

Re: Query performance against table with/out backup

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Hi,

We know one serious source of slowdown when backups are enabled. See [1]
and [2]. It will be fixed in 2.4.

Vladimir.

[1] https://issues.apache.org/jira/browse/IGNITE-6624
[2] https://issues.apache.org/jira/browse/IGNITE-6626

On Thu, Oct 19, 2017 at 9:29 PM, blackfield <ch...@maxpoint.com>
wrote:

> Here, I am trying to ascertain that I set backup == 2 properly as I
> mentioned
> above that I do not see query performance difference between backup ==1 and
> backup == 2.
>
> I want to make sure that I configure my cache properly.
>
> When I set the backup==2 (to have three copies), I notice the following via
> visor.
>
> The Affinity Backups is still equal to 1. Is this a different property than
> number of backups? If it is not, how do one see the number of backups a
> cache is configured for?
>
> Invoking "cache -a" to see the detail cache stat, with backup==2, under the
> size column, the sum of entries on all nodes is equal to the number of rows
> in the table * 2.  It appears this is the case for backup >= 1?
>
> As in, only one set of backup will be stored in off heap regardless the
> number of backups are specified?
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Query performance against table with/out backup

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi blackfield,

I can't reproduce the issue with changing number of backups when cache was
re-created.


On Fri, Oct 27, 2017 at 8:45 PM, blackfield <ch...@maxpoint.com>
wrote:

> @Andrew Mashenkov, I notice that you opened IGNITE-6781.
> However, I actually destroyed the original cache with backup == 1, recreate
> a new cache with backup ==2 and repopulate the table.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Query performance against table with/out backup

Posted by blackfield <ch...@maxpoint.com>.
@Andrew Mashenkov, I notice that you opened IGNITE-6781. 
However, I actually destroyed the original cache with backup == 1, recreate
a new cache with backup ==2 and repopulate the table.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Query performance against table with/out backup

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

You wrote you use persistence=enabled.
Looks like you try to start cache with changed (backup=2) configuration
without recreating the cache,
grid restore cache from store with its config and doesn't apply a new one.

Looks weird that neither new number of backups wasn't applied nor error
occurs.

Can you confirm if it is so and recreate cache resolve the issue?

On Thu, Oct 19, 2017 at 9:29 PM, blackfield <ch...@maxpoint.com>
wrote:

> Here, I am trying to ascertain that I set backup == 2 properly as I
> mentioned
> above that I do not see query performance difference between backup ==1 and
> backup == 2.
>
> I want to make sure that I configure my cache properly.
>
> When I set the backup==2 (to have three copies), I notice the following via
> visor.
>
> The Affinity Backups is still equal to 1. Is this a different property than
> number of backups? If it is not, how do one see the number of backups a
> cache is configured for?
>
> Invoking "cache -a" to see the detail cache stat, with backup==2, under the
> size column, the sum of entries on all nodes is equal to the number of rows
> in the table * 2.  It appears this is the case for backup >= 1?
>
> As in, only one set of backup will be stored in off heap regardless the
> number of backups are specified?
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Query performance against table with/out backup

Posted by blackfield <ch...@maxpoint.com>.
Here, I am trying to ascertain that I set backup == 2 properly as I mentioned
above that I do not see query performance difference between backup ==1 and
backup == 2. 

I want to make sure that I configure my cache properly.

When I set the backup==2 (to have three copies), I notice the following via
visor.

The Affinity Backups is still equal to 1. Is this a different property than
number of backups? If it is not, how do one see the number of backups a
cache is configured for?

Invoking "cache -a" to see the detail cache stat, with backup==2, under the
size column, the sum of entries on all nodes is equal to the number of rows
in the table * 2.  It appears this is the case for backup >= 1?

As in, only one set of backup will be stored in off heap regardless the
number of backups are specified?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Query performance against table with/out backup

Posted by blackfield <ch...@maxpoint.com>.
Can you elaborate more on why if there is a scan, Ignite will have to scan
the backup as well?

Also, it appears adding additional number of backup (backup ==2 instead of
1) does not incur additional performance cost?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Query performance against table with/out backup

Posted by vkulichenko <va...@gmail.com>.
Backup copies are stored along with primary copies in the same storage and
indexed by same indexes. As a matter of fact, any backup copy can become a
primary copy at any moment of time due to topology change. Therefore, if
there a scan, the amount of data you have to go through doubles when you add
backup.

Providing hints is not a requirement, and most likely will not be needed for
such a simple query. What I meant is that you should check the execution
plan to verify indexes are applied as you expect:
https://apacheignite.readme.io/docs/sql-performance-and-debugging#using-explain-statement

-Val



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Query performance against table with/out backup

Posted by blackfield <ch...@maxpoint.com>.
The table is created programmatically, two properties (zone and userId) were
marked(annotated) with index = true.

When you say, if the index are applied properly to the query, do you mean
whether I provide the index hint to the query or else?  If the former, the
answer is no.

The performance issue of IN clause and index is mentioned here...
https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations


Due to the nature of the query (performing count), yes, it appears, it
performs scan to the whole cache.
Does that explain that the performance is almost as twice as slow with
backup? If so, can you elaborate about more as to why?

Here is the cache config:

Cache 'SQL_PUBLIC_USER_ATTRIBUTE_SAMPLE':
+==========================================================================================================+
|                   Name                    |                           
Value                             |
+==========================================================================================================+
| Group                                     | xxx                                                    
|
| Dynamic Deployment ID                     |
8feacd52f51-4cbc5976-a413-488e-b74c-756539b0b54d             |
| Mode                                      | PARTITIONED                                                 
|
| Atomicity Mode                            | TRANSACTIONAL                                               
|
| Statistic Enabled                         | off                                                         
|
| Management Enabled                        | off                                                         
|
| Time To Live Eager Flag                   | true                                                        
|
| Write Synchronization Mode                | FULL_SYNC                                                   
|
| Invalidate                                | off                                                         
|
| Affinity Function                         |
o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction   |
| Affinity Backups                          | 1                                                           
|
| Affinity Partitions                       | 1024                                                        
|
| Affinity Exclude Neighbors                | false                                                       
|
| Affinity Mapper                           |
o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyMapper |
| Rebalance Mode                            | ASYNC                                                       
|
| Rebalance Batch Size                      | 524288                                                      
|
| Rebalance Timeout                         | 10000                                                       
|
| Rebalance Delay                           | 0                                                           
|
| Time Between Rebalance Messages           | 0                                                           
|
| Eviction Policy Enabled                   | off                                                         
|
| Eviction Policy                           | <n/a>                                                       
|
| Eviction Policy Max Size                  | <n/a>                                                       
|
| Eviction Filter                           | <n/a>                                                       
|
| Near Cache Enabled                        | off                                                         
|
| Near Start Size                           | 0                                                           
|
| Near Eviction Policy                      | <n/a>                                                       
|
| Near Eviction Policy Max Size             | <n/a>                                                       
|
| Default Lock Timeout                      | 0                                                           
|
| Metadata type count                       | 0                                                           
|
| Cache Interceptor                         | <n/a>                                                       
|
| Store Enabled                             | off                                                         
|
| Store Class                               | <n/a>                                                       
|
| Store Factory Class                       |                                                             
|
| Store Keep Binary                         | false                                                       
|
| Store Read Through                        | off                                                         
|
| Store Write Through                       | off                                                         
|
| Write-Behind Enabled                      | off                                                         
|
| Write-Behind Flush Size                   | 10240                                                       
|
| Write-Behind Frequency                    | 5000                                                        
|
| Write-Behind Flush Threads Count          | 1                                                           
|
| Write-Behind Batch Size                   | 512                                                         
|
| Concurrent Asynchronous Operations Number | 500                                                         
|
| Loader Factory Class Name                 | <n/a>                                                       
|
| Writer Factory Class Name                 | <n/a>                                                       
|
| Expiry Policy Factory Class Name          |
javax.cache.configuration.FactoryBuilder$SingletonFactory    |
| Query Execution Time Threshold            | 3000                                                        
|
| Query Schema Name                         | PUBLIC                                                      
|
| Query Escaped Names                       | on                                                          
|
| Query SQL functions                       | <n/a>                                                       
|
| Query Indexed Types                       | <n/a>                                                       
|
+----------------------------------------------------------------------------------------------------------+


As far as datamodel, the query is only against one table that has 75000+
rows, 5700+ columns. Most of the columns are double and float. zone is an
int. userId is a string.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Query performance against table with/out backup

Posted by vkulichenko <va...@gmail.com>.
Hi,

Do you have indexes configured and (if yes) are they applied properly to the
query? Did you check the execution plan?

It sounds like your query have to scan the whole cache which gets slower
with backups. Can you provide your full cache configuration and data model?

-Val



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/