You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "nkeywal (Updated) (JIRA)" <ji...@apache.org> on 2012/02/14 22:23:59 UTC

[jira] [Updated] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

     [ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5399:
---------------------------

    Attachment: 5399_inprogress.patch
    
> Cut the link between the client and the zookeeper ensemble
> ----------------------------------------------------------
>
>                 Key: HBASE-5399
>                 URL: https://issues.apache.org/jira/browse/HBASE-5399
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.94.0
>         Environment: all
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 5399_inprogress.patch
>
>
> The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection.
> There are choices to be made considering the existing API (that we don't want to break).
> The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken.
> ZooKeeper is used for:
> - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection => we have to deprecate this but keep it.
> - read get master address to create a master => now done with a temporary zookeeper connection
> - read root location => now done with a temporary zookeeper connection, but questionable. Used in public function "locateRegion". To be reworked.
> - read cluster id => now done once with a temporary zookeeper connection.
> - check if base done is available => now done once with a zookeeper connection given as a parameter
> - isTableDisabled/isTableAvailable => public functions, now done with a temporary zookeeper connection.
>      - Called internally from HBaseAdmin and HTable
> - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread => now done with a temporary zookeeper connection
> -
> Master is used for:
> - getMaster public getter, as for ZooKeeper => we have to deprecate this but keep it.
> - isMasterRunning(): public function, used internally by HMerge & HBaseAdmin
> - getHTableDescriptor*: public functions offering access to the master.  => we could make them using a temporary master connection as well.
> Main points are:
> - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow.
> - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client
> - if we move the table descriptor part away from the client, we need to find a new place for it.
> - we will have the same issue if HBaseAdmin (for both ZK & Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira