You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Udit Mehta <um...@groupon.com> on 2016/03/30 23:28:04 UTC

Hive Metastore Bottleneck

Hi all,

We are currently running Hive in production and staging with the metastore
connecting to a MySql database in the backend. The traffic in production
accessing the metastore is more than staging which is expected. We have had
a sudden increase in traffic which has led to the metastore operation
taking a lot longer than before. The same query on staging takes a lot less
due to the lesser traffic on the staging cluster.

We tried increasing the heap space for the metastore process as well as
bumped up the memory for the mysql database. Both these changes did not
seem to help much and we still see delays. Is there any other config we can
increase to counter this increased traffic? I am looking at config for max
threads as well but im not sure if this is the right path ahead.

Im wondering if the metastore is a bottleneck here or im missing something.

Looking forward to your reply,
Udit

Re: Hive Metastore Bottleneck

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Gautam

When you stated

"Have you tried putting multiple metastores behind a load balancer"


Should read

"Have you tried putting multiple *metastore services* ....

Basically there is only one backend database AKA metastore. Unless you have
set up bi-directional replication on the database, then all these services
canonly write to one database.

Hi Udit,

Have you tried looking for waitevents or equivalent in MySQL to see what
these threads are waiting for. I don't know about MySQL, we use Oracle DB.
You may have a bottleneck on your database or concurrency issues.


HTh


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 30 March 2016 at 23:20, Gautam <ga...@gmail.com> wrote:

> The metastore service is a java process that is a thrift server .. so you
> can point multiple such hive metastore instances with
> "javax.jdo.option.ConnectionURL" poitning to the same mysql db.
>
> On Wed, Mar 30, 2016 at 3:11 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>>
>>
>> Can you clarify this please
>>
>> "Have you tried putting multiple metastores behind a load balancer"
>>
>> Are you implying that metastore and backend DB are different entities
>> here.
>>
>> As far as I know $HIVE_HOME/bin/hive --service metastore & starts Hive
>> threads to the backend database/metastore and Hive server2 acts a gateway
>> for remote access to Hive metastore through beeline or other clients
>>
>> There is only one metastore here namely MySQL/Oracle or others.
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 30 March 2016 at 22:53, Gautam <ga...@gmail.com> wrote:
>>
>>> Can you elaborate on where you see the bottleneck?   A general overview
>>> of your access path would be useful. For instance if you'r accessing Hive
>>> metastore via HiveServer2 or from webhcat using embedded cli or something
>>> else.
>>>
>>> Have you tried putting multiple metastores behind a load balancer? It's
>>> just a thrift service over mysql so can have multiple instances pointing to
>>> same backend db.
>>>
>>> On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <um...@groupon.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are currently running Hive in production and staging with the
>>>> metastore connecting to a MySql database in the backend. The traffic in
>>>> production accessing the metastore is more than staging which is expected.
>>>> We have had a sudden increase in traffic which has led to the metastore
>>>> operation taking a lot longer than before. The same query on staging takes
>>>> a lot less due to the lesser traffic on the staging cluster.
>>>>
>>>> We tried increasing the heap space for the metastore process as well as
>>>> bumped up the memory for the mysql database. Both these changes did not
>>>> seem to help much and we still see delays. Is there any other config we can
>>>> increase to counter this increased traffic? I am looking at config for max
>>>> threads as well but im not sure if this is the right path ahead.
>>>>
>>>> Im wondering if the metastore is a bottleneck here or im missing
>>>> something.
>>>>
>>>> Looking forward to your reply,
>>>> Udit
>>>>
>>>
>>>
>>>
>>> --
>>> "If you really want something in this life, you have to work for it.
>>> Now, quiet! They're about to announce the lottery numbers..."
>>>
>>
>>
>
>
> --
> "If you really want something in this life, you have to work for it. Now,
> quiet! They're about to announce the lottery numbers..."
>

Re: Hive Metastore Bottleneck

Posted by Gautam <ga...@gmail.com>.
Mich,
     Thanks for clarifying. My words were probably misleading :-)

Udit,
    Fallback (or HA) is different from load balancing. Having load issues
need not mean it is unreachable, so your second service.  You want equal
load on both metastore services so a simple Round Robin load balancer
should do. We use http://www.haproxy.org/ for this. This LB picks one of N
metastore services which all look at the same DB.

That being said, I suggested this without really understanding if it will
actually help you. We use multiple instances coz the number of queries
being serviced at any time are higher than what a single instance could
handle. You need to first find out what's causing the load on your
metastore. It could be that a single service isn't your problem but
something else like :

-  How are you querying the metastore? Do your INSERT queries create large
# partitions in production? The inserts could be locking up the mysql
tables for too long hence making them unavailable to others sporadically.
Use any standard mysql perf monitoring to check this or just speak to your
DB admin :-) We use Vividcortex for this which is a paid service.
-  Number of partitons picked in your SELECT queries. Hive metastore
doesn't like it when a hive table has too many partitions (under 30k is
recommended I think?)
- Metastore service is JAVA so it can have the usual JVM latencies. What do
your GC pauses look like? Turn on gc logging or use a jvm profiler to
inspect. Solution could be picking better JVM tuning params.


.. there could be other reasons but these should give you a start.

-Gautam.


On Wed, Mar 30, 2016 at 3:33 PM, Udit Mehta <um...@groupon.com> wrote:

> But dont the clients always pick the first URI for multiple instances
> mentioned in "*hive.metastore.uris" *config and fallback to the others
> only if the first is unreachable? This way, we would still have a
> bottleneck, right?
> Can you give a little more information on your setup and how you enable
> load balancing?
> I think  i am missing something here.
>
> Thanks,
> Udit
>
> On Wed, Mar 30, 2016 at 3:20 PM, Gautam <ga...@gmail.com> wrote:
>
>> The metastore service is a java process that is a thrift server .. so you
>> can point multiple such hive metastore instances with
>> "javax.jdo.option.ConnectionURL" poitning to the same mysql db.
>>
>> On Wed, Mar 30, 2016 at 3:11 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>>
>>>
>>> Can you clarify this please
>>>
>>> "Have you tried putting multiple metastores behind a load balancer"
>>>
>>> Are you implying that metastore and backend DB are different entities
>>> here.
>>>
>>> As far as I know $HIVE_HOME/bin/hive --service metastore & starts Hive
>>> threads to the backend database/metastore and Hive server2 acts a gateway
>>> for remote access to Hive metastore through beeline or other clients
>>>
>>> There is only one metastore here namely MySQL/Oracle or others.
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 30 March 2016 at 22:53, Gautam <ga...@gmail.com> wrote:
>>>
>>>> Can you elaborate on where you see the bottleneck?   A general overview
>>>> of your access path would be useful. For instance if you'r accessing Hive
>>>> metastore via HiveServer2 or from webhcat using embedded cli or something
>>>> else.
>>>>
>>>> Have you tried putting multiple metastores behind a load balancer? It's
>>>> just a thrift service over mysql so can have multiple instances pointing to
>>>> same backend db.
>>>>
>>>> On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <um...@groupon.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We are currently running Hive in production and staging with the
>>>>> metastore connecting to a MySql database in the backend. The traffic in
>>>>> production accessing the metastore is more than staging which is expected.
>>>>> We have had a sudden increase in traffic which has led to the metastore
>>>>> operation taking a lot longer than before. The same query on staging takes
>>>>> a lot less due to the lesser traffic on the staging cluster.
>>>>>
>>>>> We tried increasing the heap space for the metastore process as well
>>>>> as bumped up the memory for the mysql database. Both these changes did not
>>>>> seem to help much and we still see delays. Is there any other config we can
>>>>> increase to counter this increased traffic? I am looking at config for max
>>>>> threads as well but im not sure if this is the right path ahead.
>>>>>
>>>>> Im wondering if the metastore is a bottleneck here or im missing
>>>>> something.
>>>>>
>>>>> Looking forward to your reply,
>>>>> Udit
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "If you really want something in this life, you have to work for it.
>>>> Now, quiet! They're about to announce the lottery numbers..."
>>>>
>>>
>>>
>>
>>
>> --
>> "If you really want something in this life, you have to work for it. Now,
>> quiet! They're about to announce the lottery numbers..."
>>
>
>


-- 
"If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers..."

Re: Hive Metastore Bottleneck

Posted by Udit Mehta <um...@groupon.com>.
But dont the clients always pick the first URI for multiple instances
mentioned in "*hive.metastore.uris" *config and fallback to the others only
if the first is unreachable? This way, we would still have a bottleneck,
right?
Can you give a little more information on your setup and how you enable
load balancing?
I think  i am missing something here.

Thanks,
Udit

On Wed, Mar 30, 2016 at 3:20 PM, Gautam <ga...@gmail.com> wrote:

> The metastore service is a java process that is a thrift server .. so you
> can point multiple such hive metastore instances with
> "javax.jdo.option.ConnectionURL" poitning to the same mysql db.
>
> On Wed, Mar 30, 2016 at 3:11 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>>
>>
>> Can you clarify this please
>>
>> "Have you tried putting multiple metastores behind a load balancer"
>>
>> Are you implying that metastore and backend DB are different entities
>> here.
>>
>> As far as I know $HIVE_HOME/bin/hive --service metastore & starts Hive
>> threads to the backend database/metastore and Hive server2 acts a gateway
>> for remote access to Hive metastore through beeline or other clients
>>
>> There is only one metastore here namely MySQL/Oracle or others.
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 30 March 2016 at 22:53, Gautam <ga...@gmail.com> wrote:
>>
>>> Can you elaborate on where you see the bottleneck?   A general overview
>>> of your access path would be useful. For instance if you'r accessing Hive
>>> metastore via HiveServer2 or from webhcat using embedded cli or something
>>> else.
>>>
>>> Have you tried putting multiple metastores behind a load balancer? It's
>>> just a thrift service over mysql so can have multiple instances pointing to
>>> same backend db.
>>>
>>> On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <um...@groupon.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are currently running Hive in production and staging with the
>>>> metastore connecting to a MySql database in the backend. The traffic in
>>>> production accessing the metastore is more than staging which is expected.
>>>> We have had a sudden increase in traffic which has led to the metastore
>>>> operation taking a lot longer than before. The same query on staging takes
>>>> a lot less due to the lesser traffic on the staging cluster.
>>>>
>>>> We tried increasing the heap space for the metastore process as well as
>>>> bumped up the memory for the mysql database. Both these changes did not
>>>> seem to help much and we still see delays. Is there any other config we can
>>>> increase to counter this increased traffic? I am looking at config for max
>>>> threads as well but im not sure if this is the right path ahead.
>>>>
>>>> Im wondering if the metastore is a bottleneck here or im missing
>>>> something.
>>>>
>>>> Looking forward to your reply,
>>>> Udit
>>>>
>>>
>>>
>>>
>>> --
>>> "If you really want something in this life, you have to work for it.
>>> Now, quiet! They're about to announce the lottery numbers..."
>>>
>>
>>
>
>
> --
> "If you really want something in this life, you have to work for it. Now,
> quiet! They're about to announce the lottery numbers..."
>

Re: Hive Metastore Bottleneck

Posted by Gautam <ga...@gmail.com>.
The metastore service is a java process that is a thrift server .. so you
can point multiple such hive metastore instances with
"javax.jdo.option.ConnectionURL" poitning to the same mysql db.

On Wed, Mar 30, 2016 at 3:11 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

>
>
> Can you clarify this please
>
> "Have you tried putting multiple metastores behind a load balancer"
>
> Are you implying that metastore and backend DB are different entities here.
>
> As far as I know $HIVE_HOME/bin/hive --service metastore & starts Hive
> threads to the backend database/metastore and Hive server2 acts a gateway
> for remote access to Hive metastore through beeline or other clients
>
> There is only one metastore here namely MySQL/Oracle or others.
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 30 March 2016 at 22:53, Gautam <ga...@gmail.com> wrote:
>
>> Can you elaborate on where you see the bottleneck?   A general overview
>> of your access path would be useful. For instance if you'r accessing Hive
>> metastore via HiveServer2 or from webhcat using embedded cli or something
>> else.
>>
>> Have you tried putting multiple metastores behind a load balancer? It's
>> just a thrift service over mysql so can have multiple instances pointing to
>> same backend db.
>>
>> On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <um...@groupon.com> wrote:
>>
>>> Hi all,
>>>
>>> We are currently running Hive in production and staging with the
>>> metastore connecting to a MySql database in the backend. The traffic in
>>> production accessing the metastore is more than staging which is expected.
>>> We have had a sudden increase in traffic which has led to the metastore
>>> operation taking a lot longer than before. The same query on staging takes
>>> a lot less due to the lesser traffic on the staging cluster.
>>>
>>> We tried increasing the heap space for the metastore process as well as
>>> bumped up the memory for the mysql database. Both these changes did not
>>> seem to help much and we still see delays. Is there any other config we can
>>> increase to counter this increased traffic? I am looking at config for max
>>> threads as well but im not sure if this is the right path ahead.
>>>
>>> Im wondering if the metastore is a bottleneck here or im missing
>>> something.
>>>
>>> Looking forward to your reply,
>>> Udit
>>>
>>
>>
>>
>> --
>> "If you really want something in this life, you have to work for it. Now,
>> quiet! They're about to announce the lottery numbers..."
>>
>
>


-- 
"If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers..."

Re: Hive Metastore Bottleneck

Posted by Udit Mehta <um...@groupon.com>.
In our setup, the metastore process and the mysql database are on separate
hosts. The metastore process connects to the database using the credentials
we provide in the hive-site.xml. Basically we follow the architecture in
the diagram attached (Source :
http://www.cloudera.com/documentation/archive/cdh/4-x/4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html
).

I am myself not clear on how to put multiple metastores behind a LB and
would be interested to know if its possible.

On Wed, Mar 30, 2016 at 3:11 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

>
>
> Can you clarify this please
>
> "Have you tried putting multiple metastores behind a load balancer"
>
> Are you implying that metastore and backend DB are different entities here.
>
> As far as I know $HIVE_HOME/bin/hive --service metastore & starts Hive
> threads to the backend database/metastore and Hive server2 acts a gateway
> for remote access to Hive metastore through beeline or other clients
>
> There is only one metastore here namely MySQL/Oracle or others.
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 30 March 2016 at 22:53, Gautam <ga...@gmail.com> wrote:
>
>> Can you elaborate on where you see the bottleneck?   A general overview
>> of your access path would be useful. For instance if you'r accessing Hive
>> metastore via HiveServer2 or from webhcat using embedded cli or something
>> else.
>>
>> Have you tried putting multiple metastores behind a load balancer? It's
>> just a thrift service over mysql so can have multiple instances pointing to
>> same backend db.
>>
>> On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <um...@groupon.com> wrote:
>>
>>> Hi all,
>>>
>>> We are currently running Hive in production and staging with the
>>> metastore connecting to a MySql database in the backend. The traffic in
>>> production accessing the metastore is more than staging which is expected.
>>> We have had a sudden increase in traffic which has led to the metastore
>>> operation taking a lot longer than before. The same query on staging takes
>>> a lot less due to the lesser traffic on the staging cluster.
>>>
>>> We tried increasing the heap space for the metastore process as well as
>>> bumped up the memory for the mysql database. Both these changes did not
>>> seem to help much and we still see delays. Is there any other config we can
>>> increase to counter this increased traffic? I am looking at config for max
>>> threads as well but im not sure if this is the right path ahead.
>>>
>>> Im wondering if the metastore is a bottleneck here or im missing
>>> something.
>>>
>>> Looking forward to your reply,
>>> Udit
>>>
>>
>>
>>
>> --
>> "If you really want something in this life, you have to work for it. Now,
>> quiet! They're about to announce the lottery numbers..."
>>
>
>

Re: Hive Metastore Bottleneck

Posted by Mich Talebzadeh <mi...@gmail.com>.
Can you clarify this please

"Have you tried putting multiple metastores behind a load balancer"

Are you implying that metastore and backend DB are different entities here.

As far as I know $HIVE_HOME/bin/hive --service metastore & starts Hive
threads to the backend database/metastore and Hive server2 acts a gateway
for remote access to Hive metastore through beeline or other clients

There is only one metastore here namely MySQL/Oracle or others.

Thanks


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 30 March 2016 at 22:53, Gautam <ga...@gmail.com> wrote:

> Can you elaborate on where you see the bottleneck?   A general overview of
> your access path would be useful. For instance if you'r accessing Hive
> metastore via HiveServer2 or from webhcat using embedded cli or something
> else.
>
> Have you tried putting multiple metastores behind a load balancer? It's
> just a thrift service over mysql so can have multiple instances pointing to
> same backend db.
>
> On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <um...@groupon.com> wrote:
>
>> Hi all,
>>
>> We are currently running Hive in production and staging with the
>> metastore connecting to a MySql database in the backend. The traffic in
>> production accessing the metastore is more than staging which is expected.
>> We have had a sudden increase in traffic which has led to the metastore
>> operation taking a lot longer than before. The same query on staging takes
>> a lot less due to the lesser traffic on the staging cluster.
>>
>> We tried increasing the heap space for the metastore process as well as
>> bumped up the memory for the mysql database. Both these changes did not
>> seem to help much and we still see delays. Is there any other config we can
>> increase to counter this increased traffic? I am looking at config for max
>> threads as well but im not sure if this is the right path ahead.
>>
>> Im wondering if the metastore is a bottleneck here or im missing
>> something.
>>
>> Looking forward to your reply,
>> Udit
>>
>
>
>
> --
> "If you really want something in this life, you have to work for it. Now,
> quiet! They're about to announce the lottery numbers..."
>

Re: Hive Metastore Bottleneck

Posted by Gautam <ga...@gmail.com>.
Can you elaborate on where you see the bottleneck?   A general overview of
your access path would be useful. For instance if you'r accessing Hive
metastore via HiveServer2 or from webhcat using embedded cli or something
else.

Have you tried putting multiple metastores behind a load balancer? It's
just a thrift service over mysql so can have multiple instances pointing to
same backend db.

On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <um...@groupon.com> wrote:

> Hi all,
>
> We are currently running Hive in production and staging with the metastore
> connecting to a MySql database in the backend. The traffic in production
> accessing the metastore is more than staging which is expected. We have had
> a sudden increase in traffic which has led to the metastore operation
> taking a lot longer than before. The same query on staging takes a lot less
> due to the lesser traffic on the staging cluster.
>
> We tried increasing the heap space for the metastore process as well as
> bumped up the memory for the mysql database. Both these changes did not
> seem to help much and we still see delays. Is there any other config we can
> increase to counter this increased traffic? I am looking at config for max
> threads as well but im not sure if this is the right path ahead.
>
> Im wondering if the metastore is a bottleneck here or im missing something.
>
> Looking forward to your reply,
> Udit
>



-- 
"If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers..."

Re: Hive Metastore Bottleneck

Posted by Udit Mehta <um...@groupon.com>.
I was looking at : *hive**.metastore.max.server.threads *but reading more
into it tells me its a config for the thrift server and not the metastore.

Most of our applications accessing the metastore are Spark Sql applications
which do INSERT operations on multiple partitions on a hourly basis. This
basically implies that most of these queries dont use the thrift server but
directly connect to the metastore from the spark application.

Can you give me more information on how we can have multiple metastores
behind a load balancer? I read about providing multiple urls in "
*hive.metastore.uris*" but saw that it would always pick the first URI and
choose from the rest only in case of a failure.

Thanks again for the replies.

On Wed, Mar 30, 2016 at 2:30 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Are you talking about increase in number of threads from Hive server2
> connection to your database (MySQL)?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 30 March 2016 at 22:28, Udit Mehta <um...@groupon.com> wrote:
>
>> Hi all,
>>
>> We are currently running Hive in production and staging with the
>> metastore connecting to a MySql database in the backend. The traffic in
>> production accessing the metastore is more than staging which is expected.
>> We have had a sudden increase in traffic which has led to the metastore
>> operation taking a lot longer than before. The same query on staging takes
>> a lot less due to the lesser traffic on the staging cluster.
>>
>> We tried increasing the heap space for the metastore process as well as
>> bumped up the memory for the mysql database. Both these changes did not
>> seem to help much and we still see delays. Is there any other config we can
>> increase to counter this increased traffic? I am looking at config for max
>> threads as well but im not sure if this is the right path ahead.
>>
>> Im wondering if the metastore is a bottleneck here or im missing
>> something.
>>
>> Looking forward to your reply,
>> Udit
>>
>
>

Re: Hive Metastore Bottleneck

Posted by Mich Talebzadeh <mi...@gmail.com>.
Are you talking about increase in number of threads from Hive server2
connection to your database (MySQL)?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 30 March 2016 at 22:28, Udit Mehta <um...@groupon.com> wrote:

> Hi all,
>
> We are currently running Hive in production and staging with the metastore
> connecting to a MySql database in the backend. The traffic in production
> accessing the metastore is more than staging which is expected. We have had
> a sudden increase in traffic which has led to the metastore operation
> taking a lot longer than before. The same query on staging takes a lot less
> due to the lesser traffic on the staging cluster.
>
> We tried increasing the heap space for the metastore process as well as
> bumped up the memory for the mysql database. Both these changes did not
> seem to help much and we still see delays. Is there any other config we can
> increase to counter this increased traffic? I am looking at config for max
> threads as well but im not sure if this is the right path ahead.
>
> Im wondering if the metastore is a bottleneck here or im missing something.
>
> Looking forward to your reply,
> Udit
>

Re: Hive Metastore Bottleneck

Posted by Jörn Franke <jo...@gmail.com>.
Is the MySQL database virtualized? Bottlenecks to storage of the MySQL database? Network could be a bottleneck? Firewalls blocking new connections in case of a sudden connection increase? 

> On 30 Mar 2016, at 23:28, Udit Mehta <um...@groupon.com> wrote:
> 
> Hi all,
> 
> We are currently running Hive in production and staging with the metastore connecting to a MySql database in the backend. The traffic in production accessing the metastore is more than staging which is expected. We have had a sudden increase in traffic which has led to the metastore operation taking a lot longer than before. The same query on staging takes a lot less due to the lesser traffic on the staging cluster.
> 
> We tried increasing the heap space for the metastore process as well as bumped up the memory for the mysql database. Both these changes did not seem to help much and we still see delays. Is there any other config we can increase to counter this increased traffic? I am looking at config for max threads as well but im not sure if this is the right path ahead.
> 
> Im wondering if the metastore is a bottleneck here or im missing something.
> 
> Looking forward to your reply,
> Udit