You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2009/07/31 01:09:14 UTC

[jira] Created: (HBASE-1728) Column family scoping

Column family scoping
---------------------

                 Key: HBASE-1728
                 URL: https://issues.apache.org/jira/browse/HBASE-1728
             Project: Hadoop HBase
          Issue Type: Sub-task
            Reporter: Andrew Purtell
             Fix For: 0.21.0


Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1728) Column family scoping and cluster identification

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-1728:
--------------------------------------

    Description: Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.  (was: Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. )
        Summary: Column family scoping and cluster identification  (was: Column family scoping)

I'm changing the title and description of this issue to reflect the new scope. We won't change stargate and thrift since replication is first added as a contrib. Also I'm bringing in the concept of cluster identification since that change is done around the exact same parts of the code.

> Column family scoping and cluster identification
> ------------------------------------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1728:
----------------------------------

    Attachment: HBASE-1728.patch

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping and cluster identification

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834477#action_12834477 ] 

Jean-Daniel Cryans commented on HBASE-1728:
-------------------------------------------

bq. Does this mean that only one cluster can be associated with one zk instance? Or is the notion that if mutliple clusters are sharing the one zk ensemble, then they will be homed (gaol'd) at different areas up in zk ( I suppose that makes sense - huh - they'd have to be)

I like the way you are able to answer yourself. Yes they have different home dir so it's ok :P

bq. The change to HLogKey means we can't read old logs. Thats probably fine, right? Migration requires that there be no hlogs in filesystem?

Right, major version change. Although I did this:
{code}
    try {
      this.clusterId = in.readByte();
      this.scope = in.readInt();
    } catch(EOFException e) {
      // Means it's an old key, just continue
    }
{code}

Will commit with your comments-related fixes. Thanks!

> Column family scoping and cluster identification
> ------------------------------------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch, HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830474#action_12830474 ] 

Jean-Daniel Cryans commented on HBASE-1728:
-------------------------------------------

Got it, I didn't see it like that.

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping and cluster identification

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833985#action_12833985 ] 

ryan rawson commented on HBASE-1728:
------------------------------------

As an aside, what other values for scope could there be than 'local' and 'global' ?

Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD.

What about table level attributes?  For those tables with a lot of families it might make life easier.

> Column family scoping and cluster identification
> ------------------------------------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch, HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1728:
----------------------------------

    Attachment: HLogKey-scoping.patch
                HCD-family-scoping.patch

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1728) Column family scoping

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans reassigned HBASE-1728:
-----------------------------------------

    Assignee: Jean-Daniel Cryans

This jira is the next on my list. Here are my thoughts :

Ryan's point is valid (to be able to change HCD without disable) but resolving that is outside of the scope of this jira (he agrees on that). This should not be a blocker.

 I'm not sure I agree that we should be able to set destinations on the family scope. What kind of mess are you creating if all families from all tables are going different ways? I don't see any reason why we should not at least first have only local or global scope.

The KV should not carry the scoping information since it's only needed in HLog where we already have access to the HTD.

As Andrew was saying in HBASE-2129, we need to be able to trace where an edit is coming from and a Byte would be enough to hold that value. It should not go in KV since that means we would store that in HFiles. I think the best would be to put it in HLogKey. How chained clusters should handle that then is when we have:

master1 => slave1 & master2 => slave2 

The second node should use a new special field in Put and Delete to set the original cluster Byte which will be passed down to HLog in order to create new HLogKey with that same value. So slave2 will still receive the location of the original cluster which may be master1 or master2. If we have a cycle:

... => slave3 & master1 => slave1 & master2 => slave2 & master3 => slave3 & master1 =>...

Then each master needs to consider if the slave cluster it's pushing to is the same as the one in the Byte of every edit it's about to replicate.

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping and cluster identification

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833994#action_12833994 ] 

Andrew Purtell commented on HBASE-1728:
---------------------------------------

@Ryan: Have you not seen the comments on several replication related issues where I have mentioned using scope > 0 as global, with scope also used to sort a priority queue? That's what I have in mind.


> Column family scoping and cluster identification
> ------------------------------------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch, HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1728:
----------------------------------

    Status: Patch Available  (was: Open)

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1728) Column family scoping and cluster identification

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-1728.
---------------------------------------

      Resolution: Fixed
    Hadoop Flags: [Incompatible change]

Committed to trunk.

> Column family scoping and cluster identification
> ------------------------------------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch, HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830430#action_12830430 ] 

Andrew Purtell commented on HBASE-1728:
---------------------------------------

{quote}
As for what we would build into core, I am thinking 1) simple binary scheme -- local or global; and, possibly 2) extend the binary scheme such that, for example, a scope of 0 means local, and a scope > 0 means global, with the desired priority of the replication set by the natural ordering of the int.
{quote}

Following up with the latter, a generic framework could e.g. read a class name from a family attribute, instantiate that object to make replication/routing decisions (via dynamic load from classpath using hdfs classloader, or using coprocessors at some future time), hand each kv to the object via an interface method, and use an int result as replication scope and priority as described above. This is like mixing filters with replication and a priority queue. Some, but not too much, additional work in return for affording users a lot of function to build upon.

Note the framework can be flexible enough for someone to go even further and encode destination as well as priority in the int and substitute their own replication engine capable of complex routing even if we think that is out of scope of core. We just need to make the right bits of the replication logic subclassable.


> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1728) Column family scoping and cluster identification

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-1728:
--------------------------------------

    Attachment: HBASE-1728.patch

Patch that adds both the scope on HCD/HLogKey and the clusterId on HLogKey. It's very minimalist on the core side because replication is contrib and I didn't change anything in other contribs. The handling of the cluster ID is scoped in HBASE-2195 and will have its own set of tests there.

> Column family scoping and cluster identification
> ------------------------------------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch, HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1728:
----------------------------------

    Attachment:     (was: HBASE-1728.patch)

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping and cluster identification

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834413#action_12834413 ] 

stack commented on HBASE-1728:
------------------------------

.bq Ryan's point is valid (to be able to change HCD without disable) but resolving that is outside of the scope of this jira (he agrees on that). This should not be a blocker.

I agree.  Master rewrite should get this.  All of HCD and tabledefinition will be up in zk.

.bq The KV should not carry the scoping information since it's only needed in HLog where we already have access to the HTD.

Agreed.

.bq Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD.

This should be fixed in master rewrite where we record only deviations from default.

Regards the patch:

{code}
+    String clusterIdName =
+            conf.get("zookeeper.znode.clusterId", "clusterId");
{code}

Does this mean that only one cluster can be associated with one zk instance?  Or is the notion that if mutliple clusters are sharing the one zk ensemble, then they will be homed (gaol'd) at different areas up in zk ( I suppose that makes sense -- huh -- they'd have to be)

Aside, 'isMaster' is a bad name for a datamember.  Should be 'master' at least by javabeans convention.

What does method name deviate from data member name in below:

{code}
+  public byte getRepId() {
+    return this.clusterId;
{code}

Align them?  Make method name getClusterId?

+  public static final String SCOPE = "SCOPE"; ... is too generic.  Make it REPLICATION_SCOPE or REP_SCOPE or something.

The change to HLogKey means we can't read old logs.  Thats probably fine, right?  Migration requires that there be no hlogs in filesystem?

+1 on commit fixing above beforehand.


> Column family scoping and cluster identification
> ------------------------------------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch, HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737355#action_12737355 ] 

ryan rawson commented on HBASE-1728:
------------------------------------

thanks for making a start, here are 2 thoughts:

- if it goes in HCD, does that mean we have to take a table outage to enable/disable replication?  That might not be acceptable to some people (me included)
- we want to capture multiple replication destinations, with individual control over each one. Replication will eventually form the backbone of our DR and data analysis strategy, so we will be expecting to have multiple replication streams.

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737616#action_12737616 ] 

Andrew Purtell commented on HBASE-1728:
---------------------------------------

@Jim: A byte would probably suffice. That said, the value is just a HCD attribute and also a value in some policy table. We wouldn't be tagging kvs with 32 bits as part of the replication stream. The policy mechanism is within the replicator. Either it forwards a kv to a peer or does not. So the width of this value has no impact on performance. 

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping and cluster identification

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834384#action_12834384 ] 

Jean-Daniel Cryans commented on HBASE-1728:
-------------------------------------------

{quote}
Also note the danger in having a code-oriented 'default' scope that is being written into every single record. Not that the default would ever change, but we should observe the lessons of https://issues.apache.org/jira/browse/HBASE-2213 in recording defaults in every HCD.
{quote}

I will be happy to change it when we have a better solution for all configurations.

{quote}
What about table level attributes? For those tables with a lot of families it might make life easier.
{quote}

Will it be easier or more complicated? Is setting the scope on 5 families that hard? My feeling is that we can add it later if we really have people asking for it, in the mean time the code is less complicated.

> Column family scoping and cluster identification
> ------------------------------------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch, HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. Also identify every HLogKey with the original cluster's ID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737627#action_12737627 ] 

Andrew Purtell commented on HBASE-1728:
---------------------------------------

@Jim: Regarding policy routing performance I think the choice here was between Integer and String. I suggest the former so policy routing only needs to do integer comparison or bit operations, not operations on arbitrary strings.

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1728:
----------------------------------

    Resolution: Invalid
        Status: Resolved  (was: Patch Available)

This isn't the way to do it. Ultimately there are several places where scoping information must be added to KeyValues to avoid breaking abstractions. Might as well just scope the KVs from the beginning.

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reopened HBASE-1728:
-----------------------------------


Reopen because it is still useful to have the region server set the scope on KVs to some default which can be configured on a per column family basis.

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737392#action_12737392 ] 

Jim Kellerman commented on HBASE-1728:
--------------------------------------

I admit I haven't been following this closely (my other job keeps getting in the way :(

However, from what I understand, scoping is currently either { local | global } is that correct?

Do we envision other types of scoping?

If not, wouldn't a byte suffice instead of an int?

I understand that you are thinking that you were envisioning routing policies, but in that case is an int enough to express that? I suppose the int could select the routing policy, but realistically how many policies will there be? 2**32 - 1 ? It would be hard to imagine that there would be more than 127, so a byte would suffice, and you could use the sign bit to indicate local or global.

In the Yahoo user database (UDB), we mostly wanted every user's data on every server farm in the US, but we often restricted what international server farms could replicate (or even access) data for users whose home locale was not serviced by that international server farm. To handle that each farm had a name (3-4 characters) and for replication outside the US, the farms that the user's data was replicated to was just a list of farms. That meant that we only had to send updates to those foreign farms if the user's data was present there. Admittedly, this policy was specific to the UDB, but I wanted to share a perspective of replication that came from my deep dark distant past.

FWIW.

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830425#action_12830425 ] 

Andrew Purtell commented on HBASE-1728:
---------------------------------------

{quote}
 I don't see any reason why we should not at least first have only local or global scope.
{quote}
 
This is my thought as well for at first.

{quote}
 I'm not sure I agree that we should be able to set destinations on the family scope. What kind of mess are you creating if all families from all tables are going different ways? 
{quote}

Well, the idea behind using an int for scoping, scoping at the family, and building kv routing as a pluggable framework is to be generic enough to separate mechanism from policy. 

As for what we would build into core, I am thinking 1) simple binary scheme -- local or global; and, possibly 2) extend the binary scheme such that, for example, a scope of 0 means local, and a scope > 0 means global, with the desired priority of the replication set by the natural ordering of the int.



> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HCD-family-scoping.patch, HLogKey-scoping.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1728) Column family scoping

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737356#action_12737356 ] 

Andrew Purtell commented on HBASE-1728:
---------------------------------------

@Ryan:

1) Yes currently but we should not need to take a table offline to update HCD or HTD attributes, so that can be handled orthogonally. One option for that is putting HTDs and HCDs up into ZK, with mirror on disk catalog tables to be used only for cold init scenarios, as discussed on IRC. 

2) This change set only associates a 32 bit integer with column families. We should support pluggable replication policies. Each can encode state into that 32 bit value however they would like. This issue anticipates a simple default policy of yes/no. 

> Column family scoping
> ---------------------
>
>                 Key: HBASE-1728
>                 URL: https://issues.apache.org/jira/browse/HBASE-1728
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1728.patch
>
>
> Support column family scoping via a new HCD attribute. Add convenience methods. Add Thrift, REST, and Stargate support. Provide initial set of scoping constants and javadoc setting expectations for a simple default binary scoping policy: replicate, or do not. Make the underlying type Integer so more complex edit routing policies are possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.