You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by BinaryTree <bi...@foxmail.com> on 2019/03/01 09:53:11 UTC

Backup make DataStreamer performance decreased a lot.

Hi Igniters - 

I know backups will impact the performance of the cluster:

If you use a PARTITIONED cache and the data loss is not critical for you (for example, when you have a backing cache store), consider disabling backups for the cache. When backups are enabled, the cache engine has to maintain a remote copy of each entry, which requires network exchange and is time-consuming.

Because the data is important and can not lose, so the backup is necessary.

But the backup make DataStreamer performance decreased a lot, if backups are disabled,  40 million records can be loaded in 4 minutes, but when set backup  = 1, after loading 20 million records, the speed decreased a lot, sometimes, it will cost more 20 seconds to load 10 thousands records.

Are there any configurations or methods can improve the performace of DataStreamer?

Related post:

http://apache-ignite-users.70518.x6.nabble.com/Ignite-Data-Streamer-Hung-after-a-period-tp21161.html

I attached the thread dumps in this post.

I also create a project to reproduce the problem, you can refer to :

https://github.com/RedBlackTreei/streamer.git

Re: Backup make DataStreamer performance decreased a lot.

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

So I have re-ran it with backup.

2 nodes with 1 backup will load 15M entries slightly faster than 1 node
which loads 30M entries without backups.

So I would say that having backups is actually slightly faster.

However, storing 15M entries without backups on 1 node is still 4-5x faster.

Regards,
-- 
Ilya Kasnacheev


пн, 4 мар. 2019 г. в 14:45, ilya.kasnacheev <il...@gmail.com>:

> Hello!
>
> Actually, now I understand I only had one node so backups were not
> applicable.
>
> I will re-run it, but as you can see, since adding more data slows down
> superlinearly, you can expect that adding back-up also decrease performance
> superlinearly.
>
> Regards,
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Backup make DataStreamer performance decreased a lot.

Posted by "ilya.kasnacheev" <il...@gmail.com>.
Hello!

Actually, now I understand I only had one node so backups were not
applicable.

I will re-run it, but as you can see, since adding more data slows down
superlinearly, you can expect that adding back-up also decrease performance
superlinearly.

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Backup make DataStreamer performance decreased a lot.

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I have ran it on my laptop.

I imagine that it will slow down a lot. Unfortunately I'm still limited to
recommendations already given.

Regards,
-- 
Ilya Kasnacheev


сб, 2 мар. 2019 г. в 10:13, Justin Ji <bi...@foxmail.com>:

> Ilya -
>
> Thank you for your kind help.
> Do you mind sharing your server configuration? I re-run with your
> configuration, it cost more than 60 minutes to load 40000000 records.
>
> And I increased data region size and checkout frequency, they improve a
> bit,
> but still too slow.
>
> According to my test, the last 20000000 records take most of the time.
>
> Is this normal? And why the last part of records takes so much time?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Backup make DataStreamer performance decreased a lot.

Posted by Justin Ji <bi...@foxmail.com>.
Ilya - 

Thank you for your kind help.
Do you mind sharing your server configuration? I re-run with your
configuration, it cost more than 60 minutes to load 40000000 records.

And I increased data region size and checkout frequency, they improve a bit,
but still too slow.

According to my test, the last 20000000 records take most of the time.

Is this normal? And why the last part of records takes so much time?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Backup make DataStreamer performance decreased a lot.

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I assume we're still talking about your reproducer
https://github.com/RedBlackTreei/streamer.git
With your code and reduced data set of 25000000

Total time:628120ms when using

cacheCfg.setSqlIndexMaxInlineSize(64);
devIdIdx.setInlineSize(96);

as opposed to Total time:820821ms with your settings.

devIdIdx needs to be so large due to
https://issues.apache.org/jira/browse/IGNITE-11125 - it will include _key :(

This may look slow, so generic optimizations might be needed:
- Have larger data region and/or more nodes.
- If possible load with WAL disabled (you can do that in runtime on per
cache basis).
- If not, have less frequent checkpoints and larger checkpoint page buffer
size.

Regards,
-- 
Ilya Kasnacheev


пт, 1 мар. 2019 г. в 18:02, Justin Ji <bi...@foxmail.com>:

> I have tried to increase QueryIndex.setInlineSize and
> CacheConfiguration.setSqlIndexMaxInlineSize to 128 256 and 512, but the
> performace became worse.
>
> Do I miss some configuration?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Backup make DataStreamer performance decreased a lot.

Posted by Justin Ji <bi...@foxmail.com>.
I have tried to increase QueryIndex.setInlineSize and 
CacheConfiguration.setSqlIndexMaxInlineSize to 128 256 and 512, but the 
performace became worse. 

Do I miss some configuration? 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Backup make DataStreamer performance decreased a lot.

Posted by Justin Ji <bi...@foxmail.com>.
I have tried to increase QueryIndex.setInlineSize and 
CacheConfiguration.setSqlIndexMaxInlineSize to 128 256 and 512, but the 
performace became worse. 

Do I miss some configuration? 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Backup make DataStreamer performance decreased a lot.

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

From the shared logs it seems that you spend time building indexes (which
are possibly not-inlined as we discussed) and I can see nothing related to
backups here.

Regards,
-- 
Ilya Kasnacheev


пт, 1 мар. 2019 г. в 17:55, Justin Ji <bi...@foxmail.com>:

> Thank for your reply!
> 1. No, I did not use FULL_SYNC, because it will wait for write or commit to
> complete on all participating remote nodes (primary and backup), so it may
> lead to a drop of write performance, am I right? But I will try it.
> 2. Yes, please refer to the attachment, I dumped thread stacks of all three
> server nodes, every nodes dumped four files.
> dump.zip
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2000/dump.zip>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Backup make DataStreamer performance decreased a lot.

Posted by Justin Ji <bi...@foxmail.com>.
Thank for your reply!
1. No, I did not use FULL_SYNC, because it will wait for write or commit to
complete on all participating remote nodes (primary and backup), so it may
lead to a drop of write performance, am I right? But I will try it.
2. Yes, please refer to the attachment, I dumped thread stacks of all three
server nodes, every nodes dumped four files.
dump.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2000/dump.zip>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Backup make DataStreamer performance decreased a lot.

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Do you use FULL_SYNC for a chance? Can you provide thread dumps during
slowdown?

Regards,
-- 
Ilya Kasnacheev


пт, 1 мар. 2019 г. в 12:53, BinaryTree <bi...@foxmail.com>:

> Hi Igniters -
>
> I know backups will impact the performance of the cluster:
>
> If you use a PARTITIONED cache and the data loss is not critical for you
> (for example, when you have a backing cache store), consider disabling
> backups for the cache. When backups are enabled, the cache engine has to
> maintain a remote copy of each entry, which requires network exchange and
> is time-consuming.
>
> Because the data is important and can not lose, so the backup is necessary.
>
> But the backup make DataStreamer performance decreased a lot, if backups
> are disabled, 40 million records can be loaded in 4 minutes, but when set
> backup = 1, after loading 20 million records, the speed decreased a lot,
> sometimes, it will cost more 20 seconds to load 10 thousands records.
>
> Are there any configurations or methods can improve the performace of
> DataStreamer?
>
> Related post:
>
>
> http://apache-ignite-users.70518.x6.nabble.com/Ignite-Data-Streamer-Hung-after-a-period-tp21161.html
>
> I attached the thread dumps in this post.
>
> I also create a project to reproduce the problem, you can refer to :
>
> https://github.com/RedBlackTreei/streamer.git
>
>