You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Alexander Shoshin <Al...@epam.com> on 2017/08/17 09:56:49 UTC

Impala daemon memory

Hi, team!

I have an issue working with impala. Maybe you could help me?

My data is stored in parquet files. I am running queries to the data on Impala through JMeter. I have 6 Impala daemons and JMeter sends queries randomly to each of them.
The problem is that only 3 of 6 Impala daemons use all available memory. Others 3 Impala daemons use 3-4 times less memory.

These 3 daemons each time are different, but there are always 3 of them. When I tried to disable 2 daemons I saw the same picture: 3 daemons used all available memory and 1 daemon not. All Impala daemons have the same mem_limit setting.

Have you ever had such strange behavior? Why not all Impala daemons use all available memory?

Regards,
Alexander Shoshin


RE: Impala daemon memory

Posted by Alexander Shoshin <Al...@epam.com>.
Hi Jeszy,

I meant that each Data Node should contains not all table data but a part of it. If I have 3 replicas then each of 6 machines will contains 1/2 of tables data, won't it?

Moreover each time I run a pack of queries through JMeter I can see that 3 different daemons become fully loaded. Not the same daemons as previous time. And other daemons only partly loaded.

Best,
Alexander

-----Original Message-----
From: Jeszy [mailto:jeszyb@gmail.com] 
Sent: Thursday, August 17, 2017 5:52 PM
To: user@impala.incubator.apache.org
Cc: Special SBER-BPOC Team <Sp...@epam.com>
Subject: Re: Impala daemon memory

Hey,

The table itself is just Impala's metadata (indeed located in every coordinator's catalog cache), the underlying data (read from HDFS) is stored on [replication factor] number of nodes.

Jeszy

On 17 August 2017 at 16:38, Alexander Shoshin <Al...@epam.com> wrote:
> Hi Petter,
>
>
>
> thanks for the advice. Replication factor could be a reason. It is 
> equals to
> 3 at the moment. I will try to increase it up to 6 and see if there 
> are any changes.
>
>
>
> But I am not sure about positive effect. I have made a data 
> rebalancing and I guess each table should be located on all Data Nodes 
> no matter what replication factor is.
>
>
>
> Regards,
>
> Alexander
>
>
>
>
>
> From: Petter von Dolwitz (Hem) [mailto:petter.von.dolwitz@gmail.com]
> Sent: Thursday, August 17, 2017 4:55 PM
> To: user@impala.incubator.apache.org
> Subject: Re: Impala daemon memory
>
>
>
> Hi,
>
>
>
> could it be related to data locality, i.e if you use HDFS with 
> replication factor 3? You could check this by increasing the 
> replication factor to 6 for the files you use and see if there is a change.
>
>
>
> Br,
>
> Petter
>
>
>
> Den 17 aug. 2017 11:57 fm skrev "Alexander Shoshin"
> <Al...@epam.com>:
>
> Hi, team!
>
>
>
> I have an issue working with impala. Maybe you could help me?
>
>
>
> My data is stored in parquet files. I am running queries to the data 
> on Impala through JMeter. I have 6 Impala daemons and JMeter sends 
> queries randomly to each of them.
>
> The problem is that only 3 of 6 Impala daemons use all available memory.
> Others 3 Impala daemons use 3-4 times less memory.
>
>
>
> These 3 daemons each time are different, but there are always 3 of them.
> When I tried to disable 2 daemons I saw the same picture: 3 daemons 
> used all available memory and 1 daemon not. All Impala daemons have 
> the same mem_limit setting.
>
>
>
> Have you ever had such strange behavior? Why not all Impala daemons 
> use all available memory?
>
>
>
> Regards,
>
> Alexander Shoshin
>
>

Re: Impala daemon memory

Posted by Jeszy <je...@gmail.com>.
Hey,

The table itself is just Impala's metadata (indeed located in every
coordinator's catalog cache), the underlying data (read from HDFS) is
stored on [replication factor] number of nodes.

Jeszy

On 17 August 2017 at 16:38, Alexander Shoshin
<Al...@epam.com> wrote:
> Hi Petter,
>
>
>
> thanks for the advice. Replication factor could be a reason. It is equals to
> 3 at the moment. I will try to increase it up to 6 and see if there are any
> changes.
>
>
>
> But I am not sure about positive effect. I have made a data rebalancing and
> I guess each table should be located on all Data Nodes no matter what
> replication factor is.
>
>
>
> Regards,
>
> Alexander
>
>
>
>
>
> From: Petter von Dolwitz (Hem) [mailto:petter.von.dolwitz@gmail.com]
> Sent: Thursday, August 17, 2017 4:55 PM
> To: user@impala.incubator.apache.org
> Subject: Re: Impala daemon memory
>
>
>
> Hi,
>
>
>
> could it be related to data locality, i.e if you use HDFS with replication
> factor 3? You could check this by increasing the replication factor to 6 for
> the files you use and see if there is a change.
>
>
>
> Br,
>
> Petter
>
>
>
> Den 17 aug. 2017 11:57 fm skrev "Alexander Shoshin"
> <Al...@epam.com>:
>
> Hi, team!
>
>
>
> I have an issue working with impala. Maybe you could help me?
>
>
>
> My data is stored in parquet files. I am running queries to the data on
> Impala through JMeter. I have 6 Impala daemons and JMeter sends queries
> randomly to each of them.
>
> The problem is that only 3 of 6 Impala daemons use all available memory.
> Others 3 Impala daemons use 3-4 times less memory.
>
>
>
> These 3 daemons each time are different, but there are always 3 of them.
> When I tried to disable 2 daemons I saw the same picture: 3 daemons used all
> available memory and 1 daemon not. All Impala daemons have the same
> mem_limit setting.
>
>
>
> Have you ever had such strange behavior? Why not all Impala daemons use all
> available memory?
>
>
>
> Regards,
>
> Alexander Shoshin
>
>

RE: Impala daemon memory

Posted by Alexander Shoshin <Al...@epam.com>.
Hi Petter,

thanks for the advice. Replication factor could be a reason. It is equals to 3 at the moment. I will try to increase it up to 6 and see if there are any changes.

But I am not sure about positive effect. I have made a data rebalancing and I guess each table should be located on all Data Nodes no matter what replication factor is.

Regards,
Alexander


From: Petter von Dolwitz (Hem) [mailto:petter.von.dolwitz@gmail.com]
Sent: Thursday, August 17, 2017 4:55 PM
To: user@impala.incubator.apache.org
Subject: Re: Impala daemon memory

Hi,

could it be related to data locality, i.e if you use HDFS with replication factor 3? You could check this by increasing the replication factor to 6 for the files you use and see if there is a change.

Br,
Petter

Den 17 aug. 2017 11:57 fm skrev "Alexander Shoshin" <Al...@epam.com>>:
Hi, team!

I have an issue working with impala. Maybe you could help me?

My data is stored in parquet files. I am running queries to the data on Impala through JMeter. I have 6 Impala daemons and JMeter sends queries randomly to each of them.
The problem is that only 3 of 6 Impala daemons use all available memory. Others 3 Impala daemons use 3-4 times less memory.

These 3 daemons each time are different, but there are always 3 of them. When I tried to disable 2 daemons I saw the same picture: 3 daemons used all available memory and 1 daemon not. All Impala daemons have the same mem_limit setting.

Have you ever had such strange behavior? Why not all Impala daemons use all available memory?

Regards,
Alexander Shoshin


Re: Impala daemon memory

Posted by "Petter von Dolwitz (Hem)" <pe...@gmail.com>.
Hi,

could it be related to data locality, i.e if you use HDFS with replication
factor 3? You could check this by increasing the replication factor to 6
for the files you use and see if there is a change.

Br,
Petter

Den 17 aug. 2017 11:57 fm skrev "Alexander Shoshin" <
Alexander_Shoshin@epam.com>:

> Hi, team!
>
>
>
> I have an issue working with impala. Maybe you could help me?
>
>
>
> My data is stored in parquet files. I am running queries to the data on
> Impala through JMeter. I have 6 Impala daemons and JMeter sends queries
> randomly to each of them.
>
> The problem is that only 3 of 6 Impala daemons use all available memory.
> Others 3 Impala daemons use 3-4 times less memory.
>
>
>
> These 3 daemons each time are different, but there are always 3 of them.
> When I tried to disable 2 daemons I saw the same picture: 3 daemons used
> all available memory and 1 daemon not. All Impala daemons have the same
> mem_limit setting.
>
>
>
> Have you ever had such strange behavior? Why not all Impala daemons use
> all available memory?
>
>
>
> Regards,
>
> Alexander Shoshin
>
>
>