You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Peter Vary (JIRA)" <ji...@apache.org> on 2018/02/14 13:45:00 UTC

[jira] [Comment Edited] (HIVE-18685) Add catalogs to metastore

    [ https://issues.apache.org/jira/browse/HIVE-18685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364045#comment-16364045 ] 

Peter Vary edited comment on HIVE-18685 at 2/14/18 1:44 PM:
------------------------------------------------------------

Hi [~alangates],

Thanks for the quick answers!
{quote}Are we seeing issues where the DB locks are slowing us down?
{quote}
We have definitely seen locking problems on customer side during DDL intensive times. See the attached mysql.log

The particular problem is solved by adding more resources for mysql, but in my mind this is only a temporary solution. We have long running transactions with unique indexes (database, table, notification) mixed with file system operations (possible S3). The primary keys can easily create situations where we end up serializing these requests.

 

When I have read this in the design document:
{quote}Possible Future Use for Caching:

We need a way in the metastore to limit the number of objects and transactions to a size that can be managed by a single server so that we can effectively cache data, transactions, and locks in the metastore. However, for obvious reasons, we do not want to limit the metastore to running on a single server. Catalogs may offer a natural place to define caching that scopes size of objects and transactions reasonably while not limiting the overall size of the metastore.
{quote}
I thought that the goal is that a single MetaStore instance should only handle a limited number of catalogs, so it can cache and serve these catalogs effectively. My assumption was that different MetaStore instances will serve different set of catalogs, and a MetaStore client - like a DFS client - first finds out which MetaStore(s) (aka. DataNode) handles the given catalog, and then query the data from there. I do not think that we need such a complicated solution for it like a NameNode, possibly a single ZooKeeper node can serve as a configuration store, and this node can easily be updated in case of a new catalog is added.

 

As for the Thrift issue:
{quote}I think I will likely still change HiveMetaStoreClient to add methods with explicit catalog name, but that is much easier than adding thrift methods.  And in HiveMetaStoreClient I can explicitly deprecate the old methods, giving users a warning not to continue using them.
{quote}
This sounds like a good temporary solution for the Thrift API problem.

 

Thanks,

Peter 


was (Author: pvary):
Hi [~alangates],

Thanks for the quick answers!
{quote}Are we seeing issues where the DB locks are slowing us down?
{quote}
We have definitely seen locking problems on customer side during DDL intensive times. See the attached mysql.log

The particular problem is solved by adding more resources for mysql, but in my mind this is only a temporary solution. We have long running transactions with unique indexes (database, table, notification) mixed with file system operations (possible S3). The primary keys can easily create situations where we end up serializing these requests.

 

When I have read this in the design document:
{quote}Possible Future Use for Caching:

We need a way in the metastore to limit the number of objects and transactions to a size that can be managed by a single server so that we can effectively cache data, transactions, and locks in the metastore. However, for obvious reasons, we do not want to limit the metastore to running on a single server. Catalogs may offer a natural place to define caching that scopes size of objects and transactions reasonably while not limiting the overall size of the metastore.
{quote}
I thought that the goal is that a single MetaStore instance should only handle a limited number of catalogs, so it can cache and serve these catalogs effectively. My assumption was that different MetaStore instances will serve different set of catalogs, and a MetaStore client - like a DFS client - first finds out which MetaStore(s) (aka. DataNode) handles the given catalog, and then query the data from there. I do not think that we need such a complicated solution for it like a NameNode, possibly a single ZooKeeper node can serve as a configuration store, and this node can easily be updated in case of a new catalog is added.

 

As for the Thrift issue:
{quote}I think I will likely still change HiveMetaStoreClient to add methods with explicit catalog name, but that is much easier than adding thrift methods.  And in HiveMetaStoreClient I can explicitly deprecate the old methods, giving users a warning not to continue using them.
{quote}
This sounds like a good temporary solution for the Thrift API problem.

 

Thanks,

Peter

Thanks,

Peter

 

> Add catalogs to metastore
> -------------------------
>
>                 Key: HIVE-18685
>                 URL: https://issues.apache.org/jira/browse/HIVE-18685
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Major
>         Attachments: HMS Catalog Design Doc.pdf
>
>
> SQL supports two levels of namespaces, called in the spec catalogs and schemas (with schema being equivalent to Hive's database).  I propose to add the upper level of catalog.  The attached design doc covers the use cases, requirements, and brief discussion of how it will be implemented in a backwards compatible way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)