You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Muhammed Favas <fa...@expeedsoftware.com> on 2019/09/09 09:31:52 UTC

Ignite Performance - Adding new node is not improving SQL query performance

Hi,

I have an ignite cluster with 4 node(each with 8 core, 32 GB RAM and 30 GB Disk)  with native persistence enabled and added to baseline topology. There are two sql table created and loaded with 120 GB data.
One of my test sql query is taking 8 second with this set up. Currently I am trying various option to reduce the execution time of the query.

For that, I have added on more node (with same configuration) to the cluster (non- baselined) with the impression that it will reduce the  execution time, but it didn't. When I checked the CPU utilization of each node, all 4 previously added node's CPU is utilizing at its best, but the CPU of newly added node is not using much.

Can you please help me to figure out why it is so and how can I make sure all the nodes CPU is utilizing when I run a distributed query so that my query runs faster.
Also, what all the additional things I need to do to make my query runs faster.

Regards,
Favas

Re: Ignite Performance - Adding new node is not improving SQL query performance

Posted by Mikael <mi...@telia.com>.

Hi!

Well, as I said do not take my word for 100% truth, I could be wrong.

Yes, nodes that are not part of the baseline will still handle 
everything except persisted data, so you can still use them for compute 
grid, reading/writing KV, ML, services and so on, but in your case you 
are running SQL queries on persisted data so you do not have any use of 
the new node unless it is part of the baseline, that is my understanding 
on it.

Hopefully someone with better knowledge than me will step in and give 
you a more detailed answer (and correct me if I am wrong).

Mikael


Den 2019-09-09 kl. 12:05, skrev Muhammed Favas:
>
> Thanks Mikael for the response.
>
> So in that case, is it  necessary to add all the new nodes to baseline 
> to make use of the resources efficiently ? But in ignite doc, it is 
> not mentioned in that way. A sub set of nodes in cluster can be part 
> of baseline.
>
> What I though of is like, when I run a query, the data will load to 
> memory of all these 5 nodes and will use the computing power of all. 
> But now it seems, it is not working like that.
>
> *Regards,*
>
> *Favas ***
>
> *From:*Mikael <mi...@telia.com>
> *Sent:* Monday, September 9, 2019 3:21 PM
> *To:* user@ignite.apache.org
> *Subject:* Re: Ignite Performance - Adding new node is not improving 
> SQL query performance
>
> Hi!
>
> If the new node is not part of the baseline topology it will not have 
> any persisted data stored so any SQL query will not be of any use on 
> the node as it does not have any of the data (at least that is how I 
> understand it, I could be wrong here).
>
> If so you would need to add the new node to the baseline topology to 
> see any performance improvement, and of course wait for a complete 
> rebalance of the data.
>
> From docs:
>
> "The same tools and APIs can be used to adjust the baseline topology 
> throughout the cluster lifetime. It's required if you decide to scale 
> out or scale in an existing topology by setting more or fewer nodes 
> that will store the data. The sections below show how to use the APIs 
> and tools."
>
> Mikael
>
> Den 2019-09-09 kl. 11:31, skrev Muhammed Favas:
>
>     Hi,
>
>     I have an ignite cluster with 4 node(each with 8 core, 32 GB RAM
>     and 30 GB Disk)  with native persistence enabled and added to
>     baseline topology. There are two sql table created and loaded with
>     120 GB data.
>
>     One of my test sql query is taking 8 second with this set up.
>     Currently I am trying various option to reduce the execution time
>     of the query.
>
>     For that, I have added on more node (with same configuration) to
>     the cluster (non- baselined) with the impression that it will
>     reduce the  execution time, but it didn’t. When I checked the CPU
>     utilization of each node, all 4 previously added node’s CPU is
>     utilizing at its best, but the CPU of newly added node is not
>     using much.
>
>     Can you please help me to figure out why it is so and how can I
>     make sure all the nodes CPU is utilizing when I run a distributed
>     query so that my query runs faster.
>
>     Also, what all the additional things I need to do to make my query
>     runs faster.
>
>     *Regards,*
>
>     *Favas *
>

RE: Ignite Performance - Adding new node is not improving SQL query performance

Posted by Muhammed Favas <fa...@expeedsoftware.com>.

Thanks Mikael for the response.

So in that case, is it  necessary to add all the new nodes to baseline to make use of the resources efficiently ? But in ignite doc, it is not mentioned in that way. A sub set of nodes in cluster can be part of baseline.

What I though of is like, when I run a query, the data will load to memory of all these 5 nodes and will use the computing power of all. But now it seems, it is not working like that.

Regards,
Favas

From: Mikael <mi...@telia.com>
Sent: Monday, September 9, 2019 3:21 PM
To: user@ignite.apache.org
Subject: Re: Ignite Performance - Adding new node is not improving SQL query performance


Hi!

If the new node is not part of the baseline topology it will not have any persisted data stored so any SQL query will not be of any use on the node as it does not have any of the data (at least that is how I understand it, I could be wrong here).

If so you would need to add the new node to the baseline topology to see any performance improvement, and of course wait for a complete rebalance of the data.

From docs:

"The same tools and APIs can be used to adjust the baseline topology throughout the cluster lifetime. It's required if you decide to scale out or scale in an existing topology by setting more or fewer nodes that will store the data. The sections below show how to use the APIs and tools."

Mikael
Den 2019-09-09 kl. 11:31, skrev Muhammed Favas:
Hi,

I have an ignite cluster with 4 node(each with 8 core, 32 GB RAM and 30 GB Disk)  with native persistence enabled and added to baseline topology. There are two sql table created and loaded with 120 GB data.
One of my test sql query is taking 8 second with this set up. Currently I am trying various option to reduce the execution time of the query.

For that, I have added on more node (with same configuration) to the cluster (non- baselined) with the impression that it will reduce the  execution time, but it didn't. When I checked the CPU utilization of each node, all 4 previously added node's CPU is utilizing at its best, but the CPU of newly added node is not using much.

Can you please help me to figure out why it is so and how can I make sure all the nodes CPU is utilizing when I run a distributed query so that my query runs faster.
Also, what all the additional things I need to do to make my query runs faster.

Regards,
Favas

Re: Ignite Performance - Adding new node is not improving SQL query performance

Posted by Mikael <mi...@telia.com>.

Hi!

If the new node is not part of the baseline topology it will not have 
any persisted data stored so any SQL query will not be of any use on the 
node as it does not have any of the data (at least that is how I 
understand it, I could be wrong here).

If so you would need to add the new node to the baseline topology to see 
any performance improvement, and of course wait for a complete rebalance 
of the data.

 From docs:

"The same tools and APIs can be used to adjust the baseline topology 
throughout the cluster lifetime. It's required if you decide to scale 
out or scale in an existing topology by setting more or fewer nodes that 
will store the data. The sections below show how to use the APIs and tools."

Mikael

Den 2019-09-09 kl. 11:31, skrev Muhammed Favas:
>
> Hi,
>
> I have an ignite cluster with 4 node(each with 8 core, 32 GB RAM and 
> 30 GB Disk)  with native persistence enabled and added to baseline 
> topology. There are two sql table created and loaded with 120 GB data.
>
> One of my test sql query is taking 8 second with this set up. 
> Currently I am trying various option to reduce the execution time of 
> the query.
>
> For that, I have added on more node (with same configuration) to the 
> cluster (non- baselined) with the impression that it will reduce the 
>  execution time, but it didn’t. When I checked the CPU utilization of 
> each node, all 4 previously added node’s CPU is utilizing at its best, 
> but the CPU of newly added node is not using much.
>
> Can you please help me to figure out why it is so and how can I make 
> sure all the nodes CPU is utilizing when I run a distributed query so 
> that my query runs faster.
>
> Also, what all the additional things I need to do to make my query 
> runs faster.
>
> *Regards,*
>
> *Favas ***
>