You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Prahalad kothwal <ko...@gmail.com> on 2018/05/03 09:26:08 UTC

Re: Impala Metadata cache limits

Thanks for your response, we are running 2.8.0 and in the process of
upgrading to 2.11.0, we have hundreds of partitioned Impala tables .

Thanks,
Prahalad

On Mon, Apr 30, 2018 at 9:35 PM, Alexander Behm <al...@cloudera.com>
wrote:

> What version of Impala are you running?
>
> On Sun, Apr 29, 2018 at 11:48 PM, Prahalad kothwal <ko...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Is there a limit to amount of metadata Impala can cache or is there a
>> recommendation from Impala community ? we were told not to have more than
>> 1gb of metadata we have 350gb of ram on each host.
>>
>> Thanks,
>> Prahalad
>>
>>
>

Re: Impala Metadata cache limits

Posted by Jim Apple <jb...@cloudera.com>.
Yes, there are many plans to improve metadata scalability. I expect you can
see some of them by looking through the JIRAs:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20IMPALA%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20Catalog%20ORDER%20BY%20priority%20DESC

On Tue, May 8, 2018 at 2:12 AM, Prahalad kothwal <ko...@gmail.com>
wrote:

> Thanks for your response, does Impala community have any plans to overcome
> this limitation ? its sounds like we can scale the data and not metadata,
> may be make RPC calls to Hive metastore than cache the metadata ?
>
> Thanks,
> Prahalad
>
> On Thu, May 3, 2018 at 9:35 PM, Alexander Behm <al...@cloudera.com>
> wrote:
>
>> I'd recommend staying below 1GB to avoid OOMing the catalogd or impalads.
>> Going up to 2GB is probably ok but is definitely approaching the danger
>> zone. The main problem here is the JVM 2GB array limit. When serializing
>> the metadata we write to a stream that's backed by a byte array. If that
>> byte array goes beyond 2GB then the JVM will OOM and take down the process.
>> You can hit this limit in various ways, and it can crash the catalogd and
>> impalads.
>>
>> This 2GB limit applies to the uncompressed thrift-serialized size of the
>> metadata.
>>
>> On Thu, May 3, 2018 at 2:26 AM, Prahalad kothwal <ko...@gmail.com>
>> wrote:
>>
>>> Thanks for your response, we are running 2.8.0 and in the process of
>>> upgrading to 2.11.0, we have hundreds of partitioned Impala tables .
>>>
>>> Thanks,
>>> Prahalad
>>>
>>> On Mon, Apr 30, 2018 at 9:35 PM, Alexander Behm <al...@cloudera.com>
>>> wrote:
>>>
>>>> What version of Impala are you running?
>>>>
>>>> On Sun, Apr 29, 2018 at 11:48 PM, Prahalad kothwal <
>>>> kothwaldev@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there a limit to amount of metadata Impala can cache or is there a
>>>>> recommendation from Impala community ? we were told not to have more than
>>>>> 1gb of metadata we have 350gb of ram on each host.
>>>>>
>>>>> Thanks,
>>>>> Prahalad
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Impala Metadata cache limits

Posted by Prahalad kothwal <ko...@gmail.com>.
Thanks for your response, does Impala community have any plans to overcome
this limitation ? its sounds like we can scale the data and not metadata,
may be make RPC calls to Hive metastore than cache the metadata ?

Thanks,
Prahalad

On Thu, May 3, 2018 at 9:35 PM, Alexander Behm <al...@cloudera.com>
wrote:

> I'd recommend staying below 1GB to avoid OOMing the catalogd or impalads.
> Going up to 2GB is probably ok but is definitely approaching the danger
> zone. The main problem here is the JVM 2GB array limit. When serializing
> the metadata we write to a stream that's backed by a byte array. If that
> byte array goes beyond 2GB then the JVM will OOM and take down the process.
> You can hit this limit in various ways, and it can crash the catalogd and
> impalads.
>
> This 2GB limit applies to the uncompressed thrift-serialized size of the
> metadata.
>
> On Thu, May 3, 2018 at 2:26 AM, Prahalad kothwal <ko...@gmail.com>
> wrote:
>
>> Thanks for your response, we are running 2.8.0 and in the process of
>> upgrading to 2.11.0, we have hundreds of partitioned Impala tables .
>>
>> Thanks,
>> Prahalad
>>
>> On Mon, Apr 30, 2018 at 9:35 PM, Alexander Behm <al...@cloudera.com>
>> wrote:
>>
>>> What version of Impala are you running?
>>>
>>> On Sun, Apr 29, 2018 at 11:48 PM, Prahalad kothwal <kothwaldev@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there a limit to amount of metadata Impala can cache or is there a
>>>> recommendation from Impala community ? we were told not to have more than
>>>> 1gb of metadata we have 350gb of ram on each host.
>>>>
>>>> Thanks,
>>>> Prahalad
>>>>
>>>>
>>>
>>
>

Re: Impala Metadata cache limits

Posted by Alexander Behm <al...@cloudera.com>.
I'd recommend staying below 1GB to avoid OOMing the catalogd or impalads.
Going up to 2GB is probably ok but is definitely approaching the danger
zone. The main problem here is the JVM 2GB array limit. When serializing
the metadata we write to a stream that's backed by a byte array. If that
byte array goes beyond 2GB then the JVM will OOM and take down the process.
You can hit this limit in various ways, and it can crash the catalogd and
impalads.

This 2GB limit applies to the uncompressed thrift-serialized size of the
metadata.

On Thu, May 3, 2018 at 2:26 AM, Prahalad kothwal <ko...@gmail.com>
wrote:

> Thanks for your response, we are running 2.8.0 and in the process of
> upgrading to 2.11.0, we have hundreds of partitioned Impala tables .
>
> Thanks,
> Prahalad
>
> On Mon, Apr 30, 2018 at 9:35 PM, Alexander Behm <al...@cloudera.com>
> wrote:
>
>> What version of Impala are you running?
>>
>> On Sun, Apr 29, 2018 at 11:48 PM, Prahalad kothwal <ko...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Is there a limit to amount of metadata Impala can cache or is there a
>>> recommendation from Impala community ? we were told not to have more than
>>> 1gb of metadata we have 350gb of ram on each host.
>>>
>>> Thanks,
>>> Prahalad
>>>
>>>
>>
>