You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by David <da...@gmail.com> on 2021/05/17 17:42:32 UTC

Hive HMS RawStore-ObjectStore Design

Hello Gang,

I just wanted to put out a few thoughts for anyone interested in the
Metastore, and in particular, the connection handling.

As I understand it, client requests from the Thrift server come into Hive
via the HMSHandler class.  This class lists all of the services (RPCs) that
the Hive Metastore provides.

This class's methods do some amount of validation, listener notification,
but it ultimately calls one or more RawStore/ObjectStore methods to
interact with the database.

This entire orchestration needs some work to make this code more easy to
work with and to improve error handling.

What I propose is:

1// Remove Thrift Errors from RawStore

Remove all references to
NoSuchObjectException/InvalidOperationException/MetaException from the
method signature of RawStore.  These Exceptions are generated by Thrift and
are used to communicate error conditions across the wire.  They are not
designed for use as part of the underlying stack, yet in Hive, they have
been pushed down into these data access operators.  The RawStore should not
have to be this tightly coupled to the transport layer.  My preference here
would be to remove all checked Exceptions from RawStore in favor of runtime
exceptions.  This is a popular format and is used (and therefore dovetails
nicely) with the underlying database access library DataNucleaus.  All of
the logging of un-checked Exceptions, and transforming them into Thrift
exceptions, should happen at the HMSHandler code.


2// Move Transaction Management

The ObjectStore has a pretty crazy path of handling transactions.  There
seems to be a lot of extra code around transaction tracking that was put in
probably because it's so hard to track transaction management within Hive.
All of the boiler-plate transaction management code should be removed from
ObjectStore and instead brought up into HMS handler as well.  This allows
the handler to create a single transaction per-request and call the
necessary ObjectStore methods.  This is not currently possible because each
ObjectStore handles transactions in its own special way. When you include
all of the open/commit/roll-back, and "transactional listeners," I'm not
certain all code paths are correct.  For example, I suspect some listeners
are being alerted outside of a transaction.  I also suspect some actions
are occurring in multiple transactions that should really be occurring
within a single transaction.

I have locally created some helper-code (TransactionOperations) to do this
from HMSHandler:

      TransactionOperations.newOperation(rawstore).execute(new
TransactionCallback<RawStore>() {

        // This method is called after openTransaction is called on the
RawStore
        // Runtime Exceptions are caught and cause the transaction to roll
back
        // The RawStore method commitTransaction is called if method
completes OK
        @Override
        public void doInTransaction(RawStore rawstore) throws MetaException
{

          // These RawStore method happen in one transaction
          rawstore.methodABC();
          rawstore.methodXXX();
          rawstore.methodXYZ();

          if (!transactionalListeners.isEmpty()) {
            transactionalListenersResponses =

MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
                    EventType.CREATE_XXX,
                    new CreateXxxEvent(true, HMSHandler.this, xxx));
          }
        }
      });


Re-architecting the method signatures to remove the MetaExceptions is a
large-ish task, but trying to unwind all this transaction code is going to
be a bear, it's what prompted me to write this email.

Thanks.

Re: Hive HMS RawStore-ObjectStore Design

Posted by David <da...@gmail.com>.
Hello Team,


Just wanted to follow up on this.

Because the code does not allow for simple transaction management, there's
a lot of weird workarounds that add a lot of complexity, for example, these
methods are all "nonTX" because they expect a transaction to already be in
place.  This comes about because some of the other methods, that do the
same thing, always open their own transactions.

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L2994-L2996

Just a reminder that I am also working on decoupling the ObjectStore code
from Thrift.  I could use some help on reviewing PRs that I submitted that
begin this process.

https://issues.apache.org/jira/browse/HIVE-25126

Thanks.


On Tue, May 18, 2021 at 9:54 AM David <da...@gmail.com> wrote:

> Hello  Narayanan,
>
> Thank you so much for your feedback.
>
> As I view it, and have had success in other projects, the RawStore
> interface defines the methods for pulling data from some underlying data
> source.
> I would imagine something like JdbcRawStore, FileRawStore, HBaseRawStore,
> MongoRawStore, etc.
>
> I do not believe that CachedStore should be a feature.  Given that there
> can be multiple HMS accessing the same data source (RDBMS for example),
> there is not a lot of caching within the HMS application.  I think we
> should revisit this, we should better leverage DataNucleaus Level 2 Cache,
> but that is another issue.
> The point is that each RawStore instance should be able to perform caching
> that makes sense for the underlying data source, I do not think there
> should be some sort
> of cache that is wrapping the RawStore implementation. My expectation
> would be that the HMSHandler would call a separate caching framework and
> only reach out to the underlying data source (the RawStore), and start a
> transaction, if there it needs to load something that is not already in the
> main cache, thus
> the cache is not tightly coupled to the RawStore stuff.  If you're not
> familiar with this, check out the Guava LoaderCache.  In this example, the
> loading code would
> reach out to the RawStore.
>
> https://github.com/google/guava/wiki/CachesExplained
>
> ------------
>
> InvalidOperationException is a Thrift-generated error.  We really, really,
> should not have these Thrift generated checked exceptions embedded in the
> HMS.  They lack
> some pretty important features like the ability to specify a "cause" for
> the exception. There are all kinds of weird workarounds in the code to
> account for these deficiencies.
> For example, there are places in the code that embed stack traces in the
> message of the Thrift Exceptions and then parse them to re-create a new
> Exception.  This is pretty
> bonkers and added a lot of code/time to add these capabilities instead of
> simply using standing POJO exceptions.  Also, if someday for example, we
> wanted to replace Thrift
> with another RPC engine (ProtoBuffs), we have to yank out all of these
> Exception classes before we can fully remove the dependency on Thrift.
>
> Any Thrift-Generated exceptions should only be generated in the HMSHandler
> (which should be called HMSThriftHandler) class.  Whatever the RawStore
> needs to throw,
> it should be unchecked Hive project exceptions.  Interestingly, as I've
> been examining the code, I've noticed that many of the unchecked Exceptions
> that DataNucleaus
> throws go un-caught, so the code claims that the Thrift-generated
> MetaException is the generic "database error," however, that is simply not
> true. Most of the methods
> do not catch the DataNucleaus exceptions and those bubble up.
>
> So yes, any such Thrift-generated Exceptions should be replace by
> something from Hive: "InvalidOperationException"  is fine to pass back to
> the Thrift client, but it should be
> something like:
>
> // HMSHandler
> try {
>    rawStore.doThings();
> } catch (IllegalArgumentException e) {
>   LOG.error("Error doing things", e);
>   throw new  InvalidOperationException(e.getMessage());
> }
>
> https://issues.apache.org/jira/browse/HIVE-25126
>
> ------------
>
> To your point about "rollbackAndCleanup," if you look at the RawStore
> interface as simply a class that provides persistence, then the
> "andCleanup" part is moot. Anything that
> is unrelated to data persistence needs to be moved out.  If the calls to
> RawStore have weird side effects that do not directly apply to the storage
> (like creating a warehouse directory in HDFS),
> that code needs to be removed.  Anything related to the transaction is
> reverted by the calling class with a call to RawStore#rollbackTransaction.
>
>
> Thanks.
>
> On Tue, May 18, 2021 at 9:17 AM Narayanan Venkateswaran <vn...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Thank you for the email,
>>
>> please find some replies inline,
>>
>> On Mon, May 17, 2021 at 11:12 PM David <da...@gmail.com> wrote:
>>
>> > Hello Gang,
>> >
>> > I just wanted to put out a few thoughts for anyone interested in the
>> > Metastore, and in particular, the connection handling.
>> >
>> > As I understand it, client requests from the Thrift server come into
>> Hive
>> > via the HMSHandler class.  This class lists all of the services (RPCs)
>> that
>> > the Hive Metastore provides.
>> >
>> > This class's methods do some amount of validation, listener
>> notification,
>> > but it ultimately calls one or more RawStore/ObjectStore methods to
>> > interact with the database.
>> >
>>
>> I guess you mean one of the RawStore implementations to interact with the
>> database,
>>
>> for example it could be CachedStore also.
>>
>>
>> >
>> > This entire orchestration needs some work to make this code more easy to
>> > work with and to improve error handling.
>> >
>> > What I propose is:
>> >
>> > 1// Remove Thrift Errors from RawStore
>> >
>> > Remove all references to
>> > NoSuchObjectException/InvalidOperationException/MetaException from the
>> > method signature of RawStore.  These Exceptions are generated by Thrift
>> and
>> > are used to communicate error conditions across the wire.  They are not
>> > designed for use as part of the underlying stack, yet in Hive, they have
>> > been pushed down into these data access operators.  The RawStore should
>> not
>> > have to be this tightly coupled to the transport layer.  My preference
>> here
>> > would be to remove all checked Exceptions from RawStore in favor of
>> runtime
>> > exceptions.
>>
>>
>> I guess we will have to look at what to do with InvalidOperationExceptions
>> such as the following,
>>
>> if (hasNsChange) {
>>       throw new InvalidOperationException("Cannot change ns; from " +
>> getNsOrDefault(plan.getNs())
>>           + " to " + changes.getNs());
>>  }
>>
>> So do we retain these?
>>
>>
>>
>>
>> > This is a popular format and is used (and therefore dovetails
>> > nicely) with the underlying database access library DataNucleaus.  All
>> of
>> > the logging of un-checked Exceptions, and transforming them into Thrift
>> > exceptions, should happen at the HMSHandler code.
>> >
>>
>> I am OK with this, but you might end up amalgamating a lot of RawStore
>> implementation-specific exception handling code in the HMSHandler, an
>> already cluttered class. For example, the CachedStore implementation may
>> be
>> throwing an exception different from the ObjectStore and we will probably
>> end up wrapping both of these in the HMSHandler.
>>
>> Similarly, each RawStore implementation may need to perform actions in the
>> event of an exception from the underlying database,
>> e.g. rollbackAndCleanup. I guess this is where the moving of transaction
>> management kicks in?
>>
>>
>>
>> >
>> > 2// Move Transaction Management
>> >
>> > The ObjectStore has a pretty crazy path of handling transactions.  There
>> > seems to be a lot of extra code around transaction tracking that was
>> put in
>> > probably because it's so hard to track transaction management within
>> Hive.
>> >
>>
>> Agreed!
>>
>>
>> > All of the boiler-plate transaction management code should be removed
>> from
>> > ObjectStore and instead brought up into HMS handler as well.
>>
>>
>> This will need to be thought through. I agree that refactoring transaction
>> management is a good idea, but moving it into HMSHandler is just going to
>> further clutter HMSHandler. HMSHandler is already a huge class with a lot
>> of clutter that needs to be refactored before we add more into it.
>>
>> This allows
>> > the handler to create a single transaction per-request and call the
>> > necessary ObjectStore methods.  This is not currently possible because
>> each
>> > ObjectStore handles transactions in its own special way. When you
>> include
>> > all of the open/commit/roll-back, and "transactional listeners," I'm not
>> > certain all code paths are correct.  For example, I suspect some
>> listeners
>> > are being alerted outside of a transaction.  I also suspect some actions
>> > are occurring in multiple transactions that should really be occurring
>> > within a single transaction.
>> >
>>
>> I think a CachedStore also passes down various calls down to the
>> ObjectStore. Starting a transaction in HMSHandler will keep the
>> transaction
>> open for a more than necessary duration. I agree that the refactoring
>> needs
>> to happen, but I am not convinced that moving all transaction handling to
>> HMSHandler is the way forward. Ofcourse I don't have a better idea as of
>> now.
>>
>>
>> >
>> > I have locally created some helper-code (TransactionOperations) to do
>> this
>> > from HMSHandler:
>> >
>> >       TransactionOperations.newOperation(rawstore).execute(new
>> > TransactionCallback<RawStore>() {
>> >
>> >         // This method is called after openTransaction is called on the
>> > RawStore
>> >         // Runtime Exceptions are caught and cause the transaction to
>> roll
>> > back
>> >         // The RawStore method commitTransaction is called if method
>> > completes OK
>> >         @Override
>> >         public void doInTransaction(RawStore rawstore) throws
>> MetaException
>> > {
>> >
>> >           // These RawStore method happen in one transaction
>> >           rawstore.methodABC();
>> >           rawstore.methodXXX();
>> >           rawstore.methodXYZ();
>> >
>> >           if (!transactionalListeners.isEmpty()) {
>> >             transactionalListenersResponses =
>> >
>> > MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
>> >                     EventType.CREATE_XXX,
>> >                     new CreateXxxEvent(true, HMSHandler.this, xxx));
>> >           }
>> >         }
>> >       });
>> >
>> >
>> > Re-architecting the method signatures to remove the MetaExceptions is a
>> > large-ish task, but trying to unwind all this transaction code is going
>> to
>> > be a bear, it's what prompted me to write this email.
>> >
>>
>> Sure, I understand.
>>
>>
>> >
>> > Thanks.
>> >
>>
>> Thank you,
>> Narayanan
>>
>

Re: Hive HMS RawStore-ObjectStore Design

Posted by David <da...@gmail.com>.
Hello  Narayanan,

Thank you so much for your feedback.

As I view it, and have had success in other projects, the RawStore
interface defines the methods for pulling data from some underlying data
source.
I would imagine something like JdbcRawStore, FileRawStore, HBaseRawStore,
MongoRawStore, etc.

I do not believe that CachedStore should be a feature.  Given that there
can be multiple HMS accessing the same data source (RDBMS for example),
there is not a lot of caching within the HMS application.  I think we
should revisit this, we should better leverage DataNucleaus Level 2 Cache,
but that is another issue.
The point is that each RawStore instance should be able to perform caching
that makes sense for the underlying data source, I do not think there
should be some sort
of cache that is wrapping the RawStore implementation. My expectation would
be that the HMSHandler would call a separate caching framework and
only reach out to the underlying data source (the RawStore), and start a
transaction, if there it needs to load something that is not already in the
main cache, thus
the cache is not tightly coupled to the RawStore stuff.  If you're not
familiar with this, check out the Guava LoaderCache.  In this example, the
loading code would
reach out to the RawStore.

https://github.com/google/guava/wiki/CachesExplained

------------

InvalidOperationException is a Thrift-generated error.  We really, really,
should not have these Thrift generated checked exceptions embedded in the
HMS.  They lack
some pretty important features like the ability to specify a "cause" for
the exception. There are all kinds of weird workarounds in the code to
account for these deficiencies.
For example, there are places in the code that embed stack traces in the
message of the Thrift Exceptions and then parse them to re-create a new
Exception.  This is pretty
bonkers and added a lot of code/time to add these capabilities instead of
simply using standing POJO exceptions.  Also, if someday for example, we
wanted to replace Thrift
with another RPC engine (ProtoBuffs), we have to yank out all of these
Exception classes before we can fully remove the dependency on Thrift.

Any Thrift-Generated exceptions should only be generated in the HMSHandler
(which should be called HMSThriftHandler) class.  Whatever the RawStore
needs to throw,
it should be unchecked Hive project exceptions.  Interestingly, as I've
been examining the code, I've noticed that many of the unchecked Exceptions
that DataNucleaus
throws go un-caught, so the code claims that the Thrift-generated
MetaException is the generic "database error," however, that is simply not
true. Most of the methods
do not catch the DataNucleaus exceptions and those bubble up.

So yes, any such Thrift-generated Exceptions should be replace by something
from Hive: "InvalidOperationException"  is fine to pass back to the Thrift
client, but it should be
something like:

// HMSHandler
try {
   rawStore.doThings();
} catch (IllegalArgumentException e) {
  LOG.error("Error doing things", e);
  throw new  InvalidOperationException(e.getMessage());
}

https://issues.apache.org/jira/browse/HIVE-25126

------------

To your point about "rollbackAndCleanup," if you look at the RawStore
interface as simply a class that provides persistence, then the
"andCleanup" part is moot. Anything that
is unrelated to data persistence needs to be moved out.  If the calls to
RawStore have weird side effects that do not directly apply to the storage
(like creating a warehouse directory in HDFS),
that code needs to be removed.  Anything related to the transaction is
reverted by the calling class with a call to RawStore#rollbackTransaction.


Thanks.

On Tue, May 18, 2021 at 9:17 AM Narayanan Venkateswaran <vn...@gmail.com>
wrote:

> Hi,
>
> Thank you for the email,
>
> please find some replies inline,
>
> On Mon, May 17, 2021 at 11:12 PM David <da...@gmail.com> wrote:
>
> > Hello Gang,
> >
> > I just wanted to put out a few thoughts for anyone interested in the
> > Metastore, and in particular, the connection handling.
> >
> > As I understand it, client requests from the Thrift server come into Hive
> > via the HMSHandler class.  This class lists all of the services (RPCs)
> that
> > the Hive Metastore provides.
> >
> > This class's methods do some amount of validation, listener notification,
> > but it ultimately calls one or more RawStore/ObjectStore methods to
> > interact with the database.
> >
>
> I guess you mean one of the RawStore implementations to interact with the
> database,
>
> for example it could be CachedStore also.
>
>
> >
> > This entire orchestration needs some work to make this code more easy to
> > work with and to improve error handling.
> >
> > What I propose is:
> >
> > 1// Remove Thrift Errors from RawStore
> >
> > Remove all references to
> > NoSuchObjectException/InvalidOperationException/MetaException from the
> > method signature of RawStore.  These Exceptions are generated by Thrift
> and
> > are used to communicate error conditions across the wire.  They are not
> > designed for use as part of the underlying stack, yet in Hive, they have
> > been pushed down into these data access operators.  The RawStore should
> not
> > have to be this tightly coupled to the transport layer.  My preference
> here
> > would be to remove all checked Exceptions from RawStore in favor of
> runtime
> > exceptions.
>
>
> I guess we will have to look at what to do with InvalidOperationExceptions
> such as the following,
>
> if (hasNsChange) {
>       throw new InvalidOperationException("Cannot change ns; from " +
> getNsOrDefault(plan.getNs())
>           + " to " + changes.getNs());
>  }
>
> So do we retain these?
>
>
>
>
> > This is a popular format and is used (and therefore dovetails
> > nicely) with the underlying database access library DataNucleaus.  All of
> > the logging of un-checked Exceptions, and transforming them into Thrift
> > exceptions, should happen at the HMSHandler code.
> >
>
> I am OK with this, but you might end up amalgamating a lot of RawStore
> implementation-specific exception handling code in the HMSHandler, an
> already cluttered class. For example, the CachedStore implementation may be
> throwing an exception different from the ObjectStore and we will probably
> end up wrapping both of these in the HMSHandler.
>
> Similarly, each RawStore implementation may need to perform actions in the
> event of an exception from the underlying database,
> e.g. rollbackAndCleanup. I guess this is where the moving of transaction
> management kicks in?
>
>
>
> >
> > 2// Move Transaction Management
> >
> > The ObjectStore has a pretty crazy path of handling transactions.  There
> > seems to be a lot of extra code around transaction tracking that was put
> in
> > probably because it's so hard to track transaction management within
> Hive.
> >
>
> Agreed!
>
>
> > All of the boiler-plate transaction management code should be removed
> from
> > ObjectStore and instead brought up into HMS handler as well.
>
>
> This will need to be thought through. I agree that refactoring transaction
> management is a good idea, but moving it into HMSHandler is just going to
> further clutter HMSHandler. HMSHandler is already a huge class with a lot
> of clutter that needs to be refactored before we add more into it.
>
> This allows
> > the handler to create a single transaction per-request and call the
> > necessary ObjectStore methods.  This is not currently possible because
> each
> > ObjectStore handles transactions in its own special way. When you include
> > all of the open/commit/roll-back, and "transactional listeners," I'm not
> > certain all code paths are correct.  For example, I suspect some
> listeners
> > are being alerted outside of a transaction.  I also suspect some actions
> > are occurring in multiple transactions that should really be occurring
> > within a single transaction.
> >
>
> I think a CachedStore also passes down various calls down to the
> ObjectStore. Starting a transaction in HMSHandler will keep the transaction
> open for a more than necessary duration. I agree that the refactoring needs
> to happen, but I am not convinced that moving all transaction handling to
> HMSHandler is the way forward. Ofcourse I don't have a better idea as of
> now.
>
>
> >
> > I have locally created some helper-code (TransactionOperations) to do
> this
> > from HMSHandler:
> >
> >       TransactionOperations.newOperation(rawstore).execute(new
> > TransactionCallback<RawStore>() {
> >
> >         // This method is called after openTransaction is called on the
> > RawStore
> >         // Runtime Exceptions are caught and cause the transaction to
> roll
> > back
> >         // The RawStore method commitTransaction is called if method
> > completes OK
> >         @Override
> >         public void doInTransaction(RawStore rawstore) throws
> MetaException
> > {
> >
> >           // These RawStore method happen in one transaction
> >           rawstore.methodABC();
> >           rawstore.methodXXX();
> >           rawstore.methodXYZ();
> >
> >           if (!transactionalListeners.isEmpty()) {
> >             transactionalListenersResponses =
> >
> > MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
> >                     EventType.CREATE_XXX,
> >                     new CreateXxxEvent(true, HMSHandler.this, xxx));
> >           }
> >         }
> >       });
> >
> >
> > Re-architecting the method signatures to remove the MetaExceptions is a
> > large-ish task, but trying to unwind all this transaction code is going
> to
> > be a bear, it's what prompted me to write this email.
> >
>
> Sure, I understand.
>
>
> >
> > Thanks.
> >
>
> Thank you,
> Narayanan
>

Re: Hive HMS RawStore-ObjectStore Design

Posted by Narayanan Venkateswaran <vn...@gmail.com>.
Hi,

Thank you for the email,

please find some replies inline,

On Mon, May 17, 2021 at 11:12 PM David <da...@gmail.com> wrote:

> Hello Gang,
>
> I just wanted to put out a few thoughts for anyone interested in the
> Metastore, and in particular, the connection handling.
>
> As I understand it, client requests from the Thrift server come into Hive
> via the HMSHandler class.  This class lists all of the services (RPCs) that
> the Hive Metastore provides.
>
> This class's methods do some amount of validation, listener notification,
> but it ultimately calls one or more RawStore/ObjectStore methods to
> interact with the database.
>

I guess you mean one of the RawStore implementations to interact with the
database,

for example it could be CachedStore also.


>
> This entire orchestration needs some work to make this code more easy to
> work with and to improve error handling.
>
> What I propose is:
>
> 1// Remove Thrift Errors from RawStore
>
> Remove all references to
> NoSuchObjectException/InvalidOperationException/MetaException from the
> method signature of RawStore.  These Exceptions are generated by Thrift and
> are used to communicate error conditions across the wire.  They are not
> designed for use as part of the underlying stack, yet in Hive, they have
> been pushed down into these data access operators.  The RawStore should not
> have to be this tightly coupled to the transport layer.  My preference here
> would be to remove all checked Exceptions from RawStore in favor of runtime
> exceptions.


I guess we will have to look at what to do with InvalidOperationExceptions
such as the following,

if (hasNsChange) {
      throw new InvalidOperationException("Cannot change ns; from " +
getNsOrDefault(plan.getNs())
          + " to " + changes.getNs());
 }

So do we retain these?




> This is a popular format and is used (and therefore dovetails
> nicely) with the underlying database access library DataNucleaus.  All of
> the logging of un-checked Exceptions, and transforming them into Thrift
> exceptions, should happen at the HMSHandler code.
>

I am OK with this, but you might end up amalgamating a lot of RawStore
implementation-specific exception handling code in the HMSHandler, an
already cluttered class. For example, the CachedStore implementation may be
throwing an exception different from the ObjectStore and we will probably
end up wrapping both of these in the HMSHandler.

Similarly, each RawStore implementation may need to perform actions in the
event of an exception from the underlying database,
e.g. rollbackAndCleanup. I guess this is where the moving of transaction
management kicks in?



>
> 2// Move Transaction Management
>
> The ObjectStore has a pretty crazy path of handling transactions.  There
> seems to be a lot of extra code around transaction tracking that was put in
> probably because it's so hard to track transaction management within Hive.
>

Agreed!


> All of the boiler-plate transaction management code should be removed from
> ObjectStore and instead brought up into HMS handler as well.


This will need to be thought through. I agree that refactoring transaction
management is a good idea, but moving it into HMSHandler is just going to
further clutter HMSHandler. HMSHandler is already a huge class with a lot
of clutter that needs to be refactored before we add more into it.

This allows
> the handler to create a single transaction per-request and call the
> necessary ObjectStore methods.  This is not currently possible because each
> ObjectStore handles transactions in its own special way. When you include
> all of the open/commit/roll-back, and "transactional listeners," I'm not
> certain all code paths are correct.  For example, I suspect some listeners
> are being alerted outside of a transaction.  I also suspect some actions
> are occurring in multiple transactions that should really be occurring
> within a single transaction.
>

I think a CachedStore also passes down various calls down to the
ObjectStore. Starting a transaction in HMSHandler will keep the transaction
open for a more than necessary duration. I agree that the refactoring needs
to happen, but I am not convinced that moving all transaction handling to
HMSHandler is the way forward. Ofcourse I don't have a better idea as of
now.


>
> I have locally created some helper-code (TransactionOperations) to do this
> from HMSHandler:
>
>       TransactionOperations.newOperation(rawstore).execute(new
> TransactionCallback<RawStore>() {
>
>         // This method is called after openTransaction is called on the
> RawStore
>         // Runtime Exceptions are caught and cause the transaction to roll
> back
>         // The RawStore method commitTransaction is called if method
> completes OK
>         @Override
>         public void doInTransaction(RawStore rawstore) throws MetaException
> {
>
>           // These RawStore method happen in one transaction
>           rawstore.methodABC();
>           rawstore.methodXXX();
>           rawstore.methodXYZ();
>
>           if (!transactionalListeners.isEmpty()) {
>             transactionalListenersResponses =
>
> MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
>                     EventType.CREATE_XXX,
>                     new CreateXxxEvent(true, HMSHandler.this, xxx));
>           }
>         }
>       });
>
>
> Re-architecting the method signatures to remove the MetaExceptions is a
> large-ish task, but trying to unwind all this transaction code is going to
> be a bear, it's what prompted me to write this email.
>

Sure, I understand.


>
> Thanks.
>

Thank you,
Narayanan

Fwd: Hive HMS RawStore-ObjectStore Design

Posted by David <da...@gmail.com>.
And here is a JIRA for continued discussion:

https://issues.apache.org/jira/browse/HIVE-25126

---------- Forwarded message ---------
From: David <da...@gmail.com>
Date: Mon, May 17, 2021 at 1:42 PM
Subject: Hive HMS RawStore-ObjectStore Design
To: dev <de...@hive.apache.org>


Hello Gang,

I just wanted to put out a few thoughts for anyone interested in the
Metastore, and in particular, the connection handling.

As I understand it, client requests from the Thrift server come into Hive
via the HMSHandler class.  This class lists all of the services (RPCs) that
the Hive Metastore provides.

This class's methods do some amount of validation, listener notification,
but it ultimately calls one or more RawStore/ObjectStore methods to
interact with the database.

This entire orchestration needs some work to make this code more easy to
work with and to improve error handling.

What I propose is:

1// Remove Thrift Errors from RawStore

Remove all references to
NoSuchObjectException/InvalidOperationException/MetaException from the
method signature of RawStore.  These Exceptions are generated by Thrift and
are used to communicate error conditions across the wire.  They are not
designed for use as part of the underlying stack, yet in Hive, they have
been pushed down into these data access operators.  The RawStore should not
have to be this tightly coupled to the transport layer.  My preference here
would be to remove all checked Exceptions from RawStore in favor of runtime
exceptions.  This is a popular format and is used (and therefore dovetails
nicely) with the underlying database access library DataNucleaus.  All of
the logging of un-checked Exceptions, and transforming them into Thrift
exceptions, should happen at the HMSHandler code.


2// Move Transaction Management

The ObjectStore has a pretty crazy path of handling transactions.  There
seems to be a lot of extra code around transaction tracking that was put in
probably because it's so hard to track transaction management within Hive.
All of the boiler-plate transaction management code should be removed from
ObjectStore and instead brought up into HMS handler as well.  This allows
the handler to create a single transaction per-request and call the
necessary ObjectStore methods.  This is not currently possible because each
ObjectStore handles transactions in its own special way. When you include
all of the open/commit/roll-back, and "transactional listeners," I'm not
certain all code paths are correct.  For example, I suspect some listeners
are being alerted outside of a transaction.  I also suspect some actions
are occurring in multiple transactions that should really be occurring
within a single transaction.

I have locally created some helper-code (TransactionOperations) to do this
from HMSHandler:

      TransactionOperations.newOperation(rawstore).execute(new
TransactionCallback<RawStore>() {

        // This method is called after openTransaction is called on the
RawStore
        // Runtime Exceptions are caught and cause the transaction to roll
back
        // The RawStore method commitTransaction is called if method
completes OK
        @Override
        public void doInTransaction(RawStore rawstore) throws MetaException
{

          // These RawStore method happen in one transaction
          rawstore.methodABC();
          rawstore.methodXXX();
          rawstore.methodXYZ();

          if (!transactionalListeners.isEmpty()) {
            transactionalListenersResponses =

MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
                    EventType.CREATE_XXX,
                    new CreateXxxEvent(true, HMSHandler.this, xxx));
          }
        }
      });


Re-architecting the method signatures to remove the MetaExceptions is a
large-ish task, but trying to unwind all this transaction code is going to
be a bear, it's what prompted me to write this email.

Thanks.