You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sivaramakrishnan Narayanan <ta...@gmail.com> on 2015/05/26 05:19:53 UTC

Caching metastore objects

Apologies if this has been discussed in the past - my searches did not pull
up any relevant threads. If there are better solutions available out of the
box, please let me know!

Problem statement
--------------------------

We have a setup where a single metastoredb is used by Hive, Presto and
SparkSQL. In addition, there are 1000s of hive queries submitted in batch
form from multiple machines. Oftentimes, the metastoredb ends up being
remote (in a different region in AWS etc) and round-trip latency is high.
We've seen single thrift calls getting translated into lots of small SQL
calls by datanucleus and the roundtrip latency ends up killing performance.
Furthermore, any of these systems may create / modify a hive table and this
should be reflected in the other system. Example, I may create a table in
hive and query it using Presto or vice versa. In our setup, there may be
multiple thrift metastore servers pointing to the same metastore db.

Investigation
-------------------

Basically, we've been looking at caching to solve this problem (will come
to invalidation in a bit). I looked briefly at DN's support for caching -
these two parameters seem to be switched off by default.

    METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
    METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type", "none"),

Furthermore, my reading of
http://www.datanucleus.org/products/datanucleus/jdo/cache.html suggests
that there is no sophistication in invalidation - seems like only
time-based invalidation is supported and it can't work across multiple PMFs
(therefore, multiple thrift metastore servers)

Solution Outline
-----------------------

   - Every table / partition will have an additional property called
   'version'
   - Any call that modifies table or partition will bump up version of the
   table / partition
   - Guava based cache of thrift objects that come from metastore calls
   - We fire a single SQL matching versions before returning from cache
   - It is conceivable to have a mode wherein invalidation based on version
   happens in a background thread (for higher performance, lower fidelity)
   - Not proposing any locking (not shooting for world peace here :) )
   - We could extend HiveMetaStore class or create a new server altogether

Is this something that would be interesting to the community? Is this
problem already solved and should I spend my time watching GoT instead?

Thanks
Siva

Re: Caching metastore objects

Posted by Sivaramakrishnan Narayanan <ta...@gmail.com>.
Awesome!!

On Wed, May 27, 2015 at 10:55 AM, Ashutosh Chauhan <ha...@apache.org>
wrote:

> Siva / Scott,
>
> Such a framework exists in some form  :
> https://issues.apache.org/jira/browse/HIVE-2038
> To make it even more generic there was a proposal
> https://issues.apache.org/jira/browse/HIVE-2147 But there was a resistance
> from a community for it. May be now community is ready for it : )
>
> Ashutosh
>
> On Tue, May 26, 2015 at 10:12 PM, Sivaramakrishnan Narayanan <
> tarball@gmail.com> wrote:
>
> > Thanks for the replies.
> >
> > @Ashutosh - thanks for the pointer! Yes I was running 0.11 metastore. Let
> > me try with 0.13 metastore! Maybe my woes will be gone. If they don't
> then
> > I'll continue working along these lines.
> >
> > @Alan - agreed. Caching MTables seems like a better approach if 0.13
> > metastore perf is not as good as I'd like.
> >
> > @Scott - a pluggable hook for metastore calls would be super useful. If
> you
> > want to generate events for client-side actions, I suppose you could just
> > implement a dynamic proxy class over the metastore client class which
> does
> > whatever you need it to. Similar technique could work in the server side
> -
> > I believe there is already a RetryingMetaStoreClient proxy class in
> place.
> >
> >
> > On Wed, May 27, 2015 at 7:32 AM, Ashutosh Chauhan <ha...@apache.org>
> > wrote:
> >
> > > Are you running pre-0.12 or with hive.metastore.try.direct.sql = false;
> > >
> > > Work done on https://issues.apache.org/jira/browse/HIVE-4051 should
> > > alleviate some of your problems.
> > >
> > >
> > > On Mon, May 25, 2015 at 8:19 PM, Sivaramakrishnan Narayanan <
> > > tarball@gmail.com> wrote:
> > >
> > > > Apologies if this has been discussed in the past - my searches did
> not
> > > pull
> > > > up any relevant threads. If there are better solutions available out
> of
> > > the
> > > > box, please let me know!
> > > >
> > > > Problem statement
> > > > --------------------------
> > > >
> > > > We have a setup where a single metastoredb is used by Hive, Presto
> and
> > > > SparkSQL. In addition, there are 1000s of hive queries submitted in
> > batch
> > > > form from multiple machines. Oftentimes, the metastoredb ends up
> being
> > > > remote (in a different region in AWS etc) and round-trip latency is
> > high.
> > > > We've seen single thrift calls getting translated into lots of small
> > SQL
> > > > calls by datanucleus and the roundtrip latency ends up killing
> > > performance.
> > > > Furthermore, any of these systems may create / modify a hive table
> and
> > > this
> > > > should be reflected in the other system. Example, I may create a
> table
> > in
> > > > hive and query it using Presto or vice versa. In our setup, there may
> > be
> > > > multiple thrift metastore servers pointing to the same metastore db.
> > > >
> > > > Investigation
> > > > -------------------
> > > >
> > > > Basically, we've been looking at caching to solve this problem (will
> > come
> > > > to invalidation in a bit). I looked briefly at DN's support for
> > caching -
> > > > these two parameters seem to be switched off by default.
> > > >
> > > >     METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
> > > >     METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type",
> > "none"),
> > > >
> > > > Furthermore, my reading of
> > > > http://www.datanucleus.org/products/datanucleus/jdo/cache.html
> > suggests
> > > > that there is no sophistication in invalidation - seems like only
> > > > time-based invalidation is supported and it can't work across
> multiple
> > > PMFs
> > > > (therefore, multiple thrift metastore servers)
> > > >
> > > > Solution Outline
> > > > -----------------------
> > > >
> > > >    - Every table / partition will have an additional property called
> > > >    'version'
> > > >    - Any call that modifies table or partition will bump up version
> of
> > > the
> > > >    table / partition
> > > >    - Guava based cache of thrift objects that come from metastore
> calls
> > > >    - We fire a single SQL matching versions before returning from
> cache
> > > >    - It is conceivable to have a mode wherein invalidation based on
> > > version
> > > >    happens in a background thread (for higher performance, lower
> > > fidelity)
> > > >    - Not proposing any locking (not shooting for world peace here :)
> )
> > > >    - We could extend HiveMetaStore class or create a new server
> > > altogether
> > > >
> > > > Is this something that would be interesting to the community? Is this
> > > > problem already solved and should I spend my time watching GoT
> instead?
> > > >
> > > > Thanks
> > > > Siva
> > > >
> > >
> >
>

Re: Caching metastore objects

Posted by Scott C Gray <sg...@us.ibm.com>.

Great, that is perfect (I think :)).   The only thing it appears to be
missing is the ability to change multiple listeners together, but that
would be a relatively simple patch.

Thanks for pointing me to it!




From:	Ashutosh Chauhan <ha...@apache.org>
To:	"dev@hive.apache.org" <de...@hive.apache.org>
Date:	05/27/2015 01:25 AM
Subject:	Re: Caching metastore objects



Siva / Scott,

Such a framework exists in some form  :
https://issues.apache.org/jira/browse/HIVE-2038
To make it even more generic there was a proposal
https://issues.apache.org/jira/browse/HIVE-2147 But there was a resistance
from a community for it. May be now community is ready for it : )

Ashutosh

On Tue, May 26, 2015 at 10:12 PM, Sivaramakrishnan Narayanan <
tarball@gmail.com> wrote:

> Thanks for the replies.
>
> @Ashutosh - thanks for the pointer! Yes I was running 0.11 metastore. Let
> me try with 0.13 metastore! Maybe my woes will be gone. If they don't
then
> I'll continue working along these lines.
>
> @Alan - agreed. Caching MTables seems like a better approach if 0.13
> metastore perf is not as good as I'd like.
>
> @Scott - a pluggable hook for metastore calls would be super useful. If
you
> want to generate events for client-side actions, I suppose you could just
> implement a dynamic proxy class over the metastore client class which
does
> whatever you need it to. Similar technique could work in the server side
-
> I believe there is already a RetryingMetaStoreClient proxy class in
place.
>
>
> On Wed, May 27, 2015 at 7:32 AM, Ashutosh Chauhan <ha...@apache.org>
> wrote:
>
> > Are you running pre-0.12 or with hive.metastore.try.direct.sql = false;
> >
> > Work done on https://issues.apache.org/jira/browse/HIVE-4051 should
> > alleviate some of your problems.
> >
> >
> > On Mon, May 25, 2015 at 8:19 PM, Sivaramakrishnan Narayanan <
> > tarball@gmail.com> wrote:
> >
> > > Apologies if this has been discussed in the past - my searches did
not
> > pull
> > > up any relevant threads. If there are better solutions available out
of
> > the
> > > box, please let me know!
> > >
> > > Problem statement
> > > --------------------------
> > >
> > > We have a setup where a single metastoredb is used by Hive, Presto
and
> > > SparkSQL. In addition, there are 1000s of hive queries submitted in
> batch
> > > form from multiple machines. Oftentimes, the metastoredb ends up
being
> > > remote (in a different region in AWS etc) and round-trip latency is
> high.
> > > We've seen single thrift calls getting translated into lots of small
> SQL
> > > calls by datanucleus and the roundtrip latency ends up killing
> > performance.
> > > Furthermore, any of these systems may create / modify a hive table
and
> > this
> > > should be reflected in the other system. Example, I may create a
table
> in
> > > hive and query it using Presto or vice versa. In our setup, there may
> be
> > > multiple thrift metastore servers pointing to the same metastore db.
> > >
> > > Investigation
> > > -------------------
> > >
> > > Basically, we've been looking at caching to solve this problem (will
> come
> > > to invalidation in a bit). I looked briefly at DN's support for
> caching -
> > > these two parameters seem to be switched off by default.
> > >
> > >     METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
> > >     METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type",
> "none"),
> > >
> > > Furthermore, my reading of
> > > http://www.datanucleus.org/products/datanucleus/jdo/cache.html
> suggests
> > > that there is no sophistication in invalidation - seems like only
> > > time-based invalidation is supported and it can't work across
multiple
> > PMFs
> > > (therefore, multiple thrift metastore servers)
> > >
> > > Solution Outline
> > > -----------------------
> > >
> > >    - Every table / partition will have an additional property called
> > >    'version'
> > >    - Any call that modifies table or partition will bump up version
of
> > the
> > >    table / partition
> > >    - Guava based cache of thrift objects that come from metastore
calls
> > >    - We fire a single SQL matching versions before returning from
cache
> > >    - It is conceivable to have a mode wherein invalidation based on
> > version
> > >    happens in a background thread (for higher performance, lower
> > fidelity)
> > >    - Not proposing any locking (not shooting for world peace
here :) )
> > >    - We could extend HiveMetaStore class or create a new server
> > altogether
> > >
> > > Is this something that would be interesting to the community? Is this
> > > problem already solved and should I spend my time watching GoT
instead?
> > >
> > > Thanks
> > > Siva
> > >
> >
>

Re: Caching metastore objects

Posted by Ashutosh Chauhan <ha...@apache.org>.
Siva / Scott,

Such a framework exists in some form  :
https://issues.apache.org/jira/browse/HIVE-2038
To make it even more generic there was a proposal
https://issues.apache.org/jira/browse/HIVE-2147 But there was a resistance
from a community for it. May be now community is ready for it : )

Ashutosh

On Tue, May 26, 2015 at 10:12 PM, Sivaramakrishnan Narayanan <
tarball@gmail.com> wrote:

> Thanks for the replies.
>
> @Ashutosh - thanks for the pointer! Yes I was running 0.11 metastore. Let
> me try with 0.13 metastore! Maybe my woes will be gone. If they don't then
> I'll continue working along these lines.
>
> @Alan - agreed. Caching MTables seems like a better approach if 0.13
> metastore perf is not as good as I'd like.
>
> @Scott - a pluggable hook for metastore calls would be super useful. If you
> want to generate events for client-side actions, I suppose you could just
> implement a dynamic proxy class over the metastore client class which does
> whatever you need it to. Similar technique could work in the server side -
> I believe there is already a RetryingMetaStoreClient proxy class in place.
>
>
> On Wed, May 27, 2015 at 7:32 AM, Ashutosh Chauhan <ha...@apache.org>
> wrote:
>
> > Are you running pre-0.12 or with hive.metastore.try.direct.sql = false;
> >
> > Work done on https://issues.apache.org/jira/browse/HIVE-4051 should
> > alleviate some of your problems.
> >
> >
> > On Mon, May 25, 2015 at 8:19 PM, Sivaramakrishnan Narayanan <
> > tarball@gmail.com> wrote:
> >
> > > Apologies if this has been discussed in the past - my searches did not
> > pull
> > > up any relevant threads. If there are better solutions available out of
> > the
> > > box, please let me know!
> > >
> > > Problem statement
> > > --------------------------
> > >
> > > We have a setup where a single metastoredb is used by Hive, Presto and
> > > SparkSQL. In addition, there are 1000s of hive queries submitted in
> batch
> > > form from multiple machines. Oftentimes, the metastoredb ends up being
> > > remote (in a different region in AWS etc) and round-trip latency is
> high.
> > > We've seen single thrift calls getting translated into lots of small
> SQL
> > > calls by datanucleus and the roundtrip latency ends up killing
> > performance.
> > > Furthermore, any of these systems may create / modify a hive table and
> > this
> > > should be reflected in the other system. Example, I may create a table
> in
> > > hive and query it using Presto or vice versa. In our setup, there may
> be
> > > multiple thrift metastore servers pointing to the same metastore db.
> > >
> > > Investigation
> > > -------------------
> > >
> > > Basically, we've been looking at caching to solve this problem (will
> come
> > > to invalidation in a bit). I looked briefly at DN's support for
> caching -
> > > these two parameters seem to be switched off by default.
> > >
> > >     METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
> > >     METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type",
> "none"),
> > >
> > > Furthermore, my reading of
> > > http://www.datanucleus.org/products/datanucleus/jdo/cache.html
> suggests
> > > that there is no sophistication in invalidation - seems like only
> > > time-based invalidation is supported and it can't work across multiple
> > PMFs
> > > (therefore, multiple thrift metastore servers)
> > >
> > > Solution Outline
> > > -----------------------
> > >
> > >    - Every table / partition will have an additional property called
> > >    'version'
> > >    - Any call that modifies table or partition will bump up version of
> > the
> > >    table / partition
> > >    - Guava based cache of thrift objects that come from metastore calls
> > >    - We fire a single SQL matching versions before returning from cache
> > >    - It is conceivable to have a mode wherein invalidation based on
> > version
> > >    happens in a background thread (for higher performance, lower
> > fidelity)
> > >    - Not proposing any locking (not shooting for world peace here :) )
> > >    - We could extend HiveMetaStore class or create a new server
> > altogether
> > >
> > > Is this something that would be interesting to the community? Is this
> > > problem already solved and should I spend my time watching GoT instead?
> > >
> > > Thanks
> > > Siva
> > >
> >
>

Re: Caching metastore objects

Posted by Sivaramakrishnan Narayanan <ta...@gmail.com>.
Thanks for the replies.

@Ashutosh - thanks for the pointer! Yes I was running 0.11 metastore. Let
me try with 0.13 metastore! Maybe my woes will be gone. If they don't then
I'll continue working along these lines.

@Alan - agreed. Caching MTables seems like a better approach if 0.13
metastore perf is not as good as I'd like.

@Scott - a pluggable hook for metastore calls would be super useful. If you
want to generate events for client-side actions, I suppose you could just
implement a dynamic proxy class over the metastore client class which does
whatever you need it to. Similar technique could work in the server side -
I believe there is already a RetryingMetaStoreClient proxy class in place.


On Wed, May 27, 2015 at 7:32 AM, Ashutosh Chauhan <ha...@apache.org>
wrote:

> Are you running pre-0.12 or with hive.metastore.try.direct.sql = false;
>
> Work done on https://issues.apache.org/jira/browse/HIVE-4051 should
> alleviate some of your problems.
>
>
> On Mon, May 25, 2015 at 8:19 PM, Sivaramakrishnan Narayanan <
> tarball@gmail.com> wrote:
>
> > Apologies if this has been discussed in the past - my searches did not
> pull
> > up any relevant threads. If there are better solutions available out of
> the
> > box, please let me know!
> >
> > Problem statement
> > --------------------------
> >
> > We have a setup where a single metastoredb is used by Hive, Presto and
> > SparkSQL. In addition, there are 1000s of hive queries submitted in batch
> > form from multiple machines. Oftentimes, the metastoredb ends up being
> > remote (in a different region in AWS etc) and round-trip latency is high.
> > We've seen single thrift calls getting translated into lots of small SQL
> > calls by datanucleus and the roundtrip latency ends up killing
> performance.
> > Furthermore, any of these systems may create / modify a hive table and
> this
> > should be reflected in the other system. Example, I may create a table in
> > hive and query it using Presto or vice versa. In our setup, there may be
> > multiple thrift metastore servers pointing to the same metastore db.
> >
> > Investigation
> > -------------------
> >
> > Basically, we've been looking at caching to solve this problem (will come
> > to invalidation in a bit). I looked briefly at DN's support for caching -
> > these two parameters seem to be switched off by default.
> >
> >     METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
> >     METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type", "none"),
> >
> > Furthermore, my reading of
> > http://www.datanucleus.org/products/datanucleus/jdo/cache.html suggests
> > that there is no sophistication in invalidation - seems like only
> > time-based invalidation is supported and it can't work across multiple
> PMFs
> > (therefore, multiple thrift metastore servers)
> >
> > Solution Outline
> > -----------------------
> >
> >    - Every table / partition will have an additional property called
> >    'version'
> >    - Any call that modifies table or partition will bump up version of
> the
> >    table / partition
> >    - Guava based cache of thrift objects that come from metastore calls
> >    - We fire a single SQL matching versions before returning from cache
> >    - It is conceivable to have a mode wherein invalidation based on
> version
> >    happens in a background thread (for higher performance, lower
> fidelity)
> >    - Not proposing any locking (not shooting for world peace here :) )
> >    - We could extend HiveMetaStore class or create a new server
> altogether
> >
> > Is this something that would be interesting to the community? Is this
> > problem already solved and should I spend my time watching GoT instead?
> >
> > Thanks
> > Siva
> >
>

Re: Caching metastore objects

Posted by Ashutosh Chauhan <ha...@apache.org>.
Are you running pre-0.12 or with hive.metastore.try.direct.sql = false;

Work done on https://issues.apache.org/jira/browse/HIVE-4051 should
alleviate some of your problems.


On Mon, May 25, 2015 at 8:19 PM, Sivaramakrishnan Narayanan <
tarball@gmail.com> wrote:

> Apologies if this has been discussed in the past - my searches did not pull
> up any relevant threads. If there are better solutions available out of the
> box, please let me know!
>
> Problem statement
> --------------------------
>
> We have a setup where a single metastoredb is used by Hive, Presto and
> SparkSQL. In addition, there are 1000s of hive queries submitted in batch
> form from multiple machines. Oftentimes, the metastoredb ends up being
> remote (in a different region in AWS etc) and round-trip latency is high.
> We've seen single thrift calls getting translated into lots of small SQL
> calls by datanucleus and the roundtrip latency ends up killing performance.
> Furthermore, any of these systems may create / modify a hive table and this
> should be reflected in the other system. Example, I may create a table in
> hive and query it using Presto or vice versa. In our setup, there may be
> multiple thrift metastore servers pointing to the same metastore db.
>
> Investigation
> -------------------
>
> Basically, we've been looking at caching to solve this problem (will come
> to invalidation in a bit). I looked briefly at DN's support for caching -
> these two parameters seem to be switched off by default.
>
>     METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
>     METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type", "none"),
>
> Furthermore, my reading of
> http://www.datanucleus.org/products/datanucleus/jdo/cache.html suggests
> that there is no sophistication in invalidation - seems like only
> time-based invalidation is supported and it can't work across multiple PMFs
> (therefore, multiple thrift metastore servers)
>
> Solution Outline
> -----------------------
>
>    - Every table / partition will have an additional property called
>    'version'
>    - Any call that modifies table or partition will bump up version of the
>    table / partition
>    - Guava based cache of thrift objects that come from metastore calls
>    - We fire a single SQL matching versions before returning from cache
>    - It is conceivable to have a mode wherein invalidation based on version
>    happens in a background thread (for higher performance, lower fidelity)
>    - Not proposing any locking (not shooting for world peace here :) )
>    - We could extend HiveMetaStore class or create a new server altogether
>
> Is this something that would be interesting to the community? Is this
> problem already solved and should I spend my time watching GoT instead?
>
> Thanks
> Siva
>

Re: Caching metastore objects

Posted by Scott C Gray <sg...@us.ibm.com>.

Hi,

This may or may not be considered a separate topic, but beyond the caching
that could take place in the metastore itself, there is generally a need
for a general purpose mechanism for notifications about metastore changes.
For example, a number of applications currently maintain their own caches
of metastore objects, such as Big SQL or Impala.   On the Big SQL
development side, we've been thinking about the possibility of a low-level,
pluggable, notification mechanism for changes (I think a partial framework
for this was attempted in hcatalog?), perhaps with the ability to
concurrently chain multiple implementations.  Thus, Hive could install its
own implementation to keep it's local metastore caches in sync but, say,
another implementation could post changes to kafka to broadcast
notifications to any interested party.

-scott



From:	Alan Gates <al...@gmail.com>
To:	dev@hive.apache.org
Date:	05/26/2015 01:57 PM
Subject:	Re: Caching metastore objects






         Sivaramakrishnan Narayanan
         May 25, 2015 at 20:19
         Apologies if this has been discussed in the past - my searches did
         not pull
         up any relevant threads. If there are better solutions available
         out of the
         box, please let me know!

         Problem statement
         --------------------------

         We have a setup where a single metastoredb is used by Hive, Presto
         and
         SparkSQL. In addition, there are 1000s of hive queries submitted
         in batch
         form from multiple machines. Oftentimes, the metastoredb ends up
         being
         remote (in a different region in AWS etc) and round-trip latency
         is high.
         We've seen single thrift calls getting translated into lots of
         small SQL
         calls by datanucleus and the roundtrip latency ends up killing
         performance.
         Furthermore, any of these systems may create / modify a hive table
         and this
         should be reflected in the other system. Example, I may create a
         table in
         hive and query it using Presto or vice versa. In our setup, there
         may be
         multiple thrift metastore servers pointing to the same metastore
         db.

         Investigation
         -------------------

         Basically, we've been looking at caching to solve this problem
         (will come
         to invalidation in a bit). I looked briefly at DN's support for
         caching -
         these two parameters seem to be switched off by default.

         METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
         METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type",
         "none"),

         Furthermore, my reading of
         http://www.datanucleus.org/products/datanucleus/jdo/cache.html
         suggests
         that there is no sophistication in invalidation - seems like only
         time-based invalidation is supported and it can't work across
         multiple PMFs
         (therefore, multiple thrift metastore servers)

         Solution Outline
         -----------------------

         - Every table / partition will have an additional property called
         'version'
         - Any call that modifies table or partition will bump up version
         of the
         table / partition
         - Guava based cache of thrift objects that come from metastore
         calls
         - We fire a single SQL matching versions before returning from
         cache
         - It is conceivable to have a mode wherein invalidation based on
         version
         happens in a background thread (for higher performance, lower
         fidelity)
         - Not proposing any locking (not shooting for world peace
         here :) )
         - We could extend HiveMetaStore class or create a new server
         altogether
I think you want to do this at the ObjectStore level, not the HiveMetaStore
level.  Since the guava caching includes knowledge of how to fetch the item
into the cache the details of how to actually the fetch the item will bleed
into your caching layer.  You don't want to put SQL directly into the
HiveMetaStore layer since there are alternative, non-SQL implementations of
that layer (see below).

         Is this something that would be interesting to the community? Is
         this
         problem already solved and should I spend my time watching GoT
         instead?
There is work going on to enable storing metadata in HBase instead of an
RDBMS.  One of the explicit goals of this work is to radically reduce the
number of round trips between Hive and the metadata store.  So instead of
one thrift call resulting in many SQL calls it will result in a one or at
most a few HBase calls.  This may or may not solve your problem, and it may
be much more radical a solution than you desire.  Also, it isn't stable and
tested yet so you would have to wait a while for it.  But if you are
interested its happening in the hbase-metastore branch of Hive.

Alan.

         Thanks
         Siva

Re: Caching metastore objects

Posted by Alan Gates <al...@gmail.com>.

> Sivaramakrishnan Narayanan <ma...@gmail.com>
> May 25, 2015 at 20:19
> Apologies if this has been discussed in the past - my searches did not 
> pull
> up any relevant threads. If there are better solutions available out 
> of the
> box, please let me know!
>
> Problem statement
> --------------------------
>
> We have a setup where a single metastoredb is used by Hive, Presto and
> SparkSQL. In addition, there are 1000s of hive queries submitted in batch
> form from multiple machines. Oftentimes, the metastoredb ends up being
> remote (in a different region in AWS etc) and round-trip latency is high.
> We've seen single thrift calls getting translated into lots of small SQL
> calls by datanucleus and the roundtrip latency ends up killing 
> performance.
> Furthermore, any of these systems may create / modify a hive table and 
> this
> should be reflected in the other system. Example, I may create a table in
> hive and query it using Presto or vice versa. In our setup, there may be
> multiple thrift metastore servers pointing to the same metastore db.
>
> Investigation
> -------------------
>
> Basically, we've been looking at caching to solve this problem (will come
> to invalidation in a bit). I looked briefly at DN's support for caching -
> these two parameters seem to be switched off by default.
>
> METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
> METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type", "none"),
>
> Furthermore, my reading of
> http://www.datanucleus.org/products/datanucleus/jdo/cache.html suggests
> that there is no sophistication in invalidation - seems like only
> time-based invalidation is supported and it can't work across multiple 
> PMFs
> (therefore, multiple thrift metastore servers)
>
> Solution Outline
> -----------------------
>
> - Every table / partition will have an additional property called
> 'version'
> - Any call that modifies table or partition will bump up version of the
> table / partition
> - Guava based cache of thrift objects that come from metastore calls
> - We fire a single SQL matching versions before returning from cache
> - It is conceivable to have a mode wherein invalidation based on version
> happens in a background thread (for higher performance, lower fidelity)
> - Not proposing any locking (not shooting for world peace here :) )
> - We could extend HiveMetaStore class or create a new server altogether
I think you want to do this at the ObjectStore level, not the 
HiveMetaStore level.  Since the guava caching includes knowledge of how 
to fetch the item into the cache the details of how to actually the 
fetch the item will bleed into your caching layer.  You don't want to 
put SQL directly into the HiveMetaStore layer since there are 
alternative, non-SQL implementations of that layer (see below).
>
> Is this something that would be interesting to the community? Is this
> problem already solved and should I spend my time watching GoT instead?
There is work going on to enable storing metadata in HBase instead of an 
RDBMS.  One of the explicit goals of this work is to radically reduce 
the number of round trips between Hive and the metadata store.  So 
instead of one thrift call resulting in many SQL calls it will result in 
a one or at most a few HBase calls.  This may or may not solve your 
problem, and it may be much more radical a solution than you desire.  
Also, it isn't stable and tested yet so you would have to wait a while 
for it.  But if you are interested its happening in the hbase-metastore 
branch of Hive.

Alan.
>
> Thanks
> Siva
>