You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jesse Yates <je...@gmail.com> on 2011/10/17 19:45:21 UTC

adding constraints

Hey everyone,

TL;DR Adding classic DB constraints as a system level coprocessor to help
simplify using HBase and ease adopting.

Coprocessors are a really powerful mechanism and are incredibly useful for a
variety of things. However, I feel like the mechanism for using them can be
very daunting and, for certain features, could do with some simplification.

What I would like to propose is a simple interface that people can use to
implement a 'constraint' (matching the classic database definition). This
would help ease of adoption by helping HBase more easily check that box,
help minimize code duplication across organizations, and lead to easier
adoption.

Essentially, people would implement a 'Constraint' interface for checking
keys before they are put into a table. Puts that are valid get written to
the table, but if not people can will throw an exception that gets
propagated back to the client explaining why the put was invalid.

Constraints would be set on a per-table basis and the user would be expected
to ensure the jars containing the constraint are present on the machines
serving that table.

Yes, people could roll their own mechanism for doing this via coprocessors
each time, but this would make it easier to do so, so you only have to
implement a very minimal interface and not worry about the specifics.

If people are interested, I would like to open a Jira on the feature. I've
got a basic implementation, but would like to expand it to be a more
integrated, top-level element of the code. I just don't want to waste my
time doing a full blown impl and then not have at least general concensus on
it being a good feature.

One of the complaints I commonly hear about HBase is that, to outsiders, it
is difficult to figure out and use (though once you do, its solid). This
would be a step to make it easier to use and adopt.

Thanks,
Jesse Yates

Re: adding constraints

Posted by Andrew Purtell <ap...@apache.org>.
Sorry, I meant to refer to HBASE-4605.

I was here, kind of, a long time ago with HBASE-2395. I closed that as a duplicate of HBASE-4605 because what Jesse wrote is clearer.

   - Andy
 

>________________________________
>From: Andrew Purtell <ap...@apache.org>
>To: "dev@hbase.apache.org" <de...@hbase.apache.org>
>Sent: Tuesday, October 18, 2011 3:31 PM
>Subject: Re: adding constraints
>
>> > Yeah, we can do security, but you have to include
>> > the jars make sure it works, etc. As opposed to _certain_ systems where
>> > security is built in. Similar arguments can be made for things like
>> > constraints - its just _easier_ to have it built in, and let people use them
>> > (or not) as they choose.
>
>> [...] But the current approach for security was
>> arrived at as a result of extensive discussions with the entire
>> community about the right approach, where many concerns were raised
>> about paying any overhead for security when it was not being used.  As
>> a result, all security components were built in a loadable fashion,
>> with the trade-off of some extra configuration complexity
>
>This discussion more than casually reminds me of past discussions regarding moving from a statically linked kernel to one that supports dynamically loaded modules seen on both the Linux and FreeBSD mailing lists.
>
>Again we have a tightly coupled code base making a transition to dynamic runtime composition.
>
>IMO, anyone concerned that HBase doesn't have security or constraints built in can ship a default configuration that has either or both loaded as system coprocessors. Those that don't want the "bloat" can simply not load them. This balances the demands we will see over a contiuum here, from those that want the most functionality "out of the box", to those that want maximum performance or minimal runtime complexity or both.
>
>If there is sufficient concern about user-friendliness, those so concerned could build the plumbing to automatically load coprocessors a la modprobe. Perhaps by reading hbase-site.xml and matching config vars to CP jars (via reflection and some kind of decorator convention?).
>
>I also see HBASE-4554 as about improving how CPs get configuration, if needed, and how the user can change it.
>
>It looks like everyone is in favor of, or at least does not object to, some sort of constraint checking and enforcement implemented as a coprocessor, independent of the core code.
>
>Personally, I have the same attitude about this as I did security -- it's great to have, and even better if it can be dynamically loaded only as needed so those that do not want it suffer no overheads or performance degradation.
>
>   - Andy
>
>
>
>
>
>>________________________________
>>From: Gary Helmling <gh...@gmail.com>
>>To: dev@hbase.apache.org
>>Sent: Tuesday, October 18, 2011 12:43 AM
>>Subject: Re: adding constraints
>>
>>>
>>> There is an example of how to do Constraints as a jar with CPs already
>>> attached to the ticket, and its pretty simple. However, the ticket goes into
>>> the plusses and minuses for a top-level or just basic CP based
>>> implementation.
>>>
>>> For me, the best reason for top level is top make HBase easy to use and have
>>> certain built-in features.
>>
>>Hmm, I wasn't really reading the two implementation options for
>>constraints as a choice between a "built-in" feature and CP based.
>>I'm reading it as a choice between:
>>1) a bundled CP implementation (which you still have to _enable_) that
>>does constraint checking loading user classes that implement a simple
>>interface (Constraint or Predicate<Put> or whatever)
>>2) an abstract CP example class that you have to extend with your own
>>implementation logic, which, if you want to do it right, you'll still
>>wind up with something resembling #1 anyway
>>
>>FYI, I see option #1 as fairly analogous to the bundled aggregation
>>client that Lars mentioned.
>>
>>If you want this as real top-level functionality built directly in to,
>>say, the HRegion code paths for puts, the question is why should we
>>add the complexity directly when we have CPs?
>>
>>> Yeah, we can do security, but you have to include
>>> the jars make sure it works, etc. As opposed to _certain_ systems where
>>> security is built in. Similar arguments can be made for things like
>>> constraints - its just _easier_ to have it built in, and let people use them
>>> (or not) as they choose.
>>>
>>
>>We have a security implementation up for review that provides
>>meaningful security.  Yes, it has to be enabled to be used and the
>>process of configuring it could be much simpler.  Security is always a
>>matter of trade-offs.  You can argue about about whether or not we've
>>made the right ones.  But the current approach for security was
>>arrived at as a result of extensive discussions with the entire
>>community about the right approach, where many concerns were raised
>>about paying any overhead for security when it was not being used.  As
>>a result, all security components were built in a loadable fashion,
>>with the trade-off of some extra configuration complexity.
>>
>>Yes, Accumulo has "security" always enabled.  But this is still not an
>>apples-to-apples comparison.  HBase security relies on Kerberos to
>>provide a trusted third part for strong authentication while never
>>sending the password over the wire.  Accumulo sends username and
>>password in plain text on the rpc connections.  As a result HBase
>>relies on external systems for managing credentials, while Accumulo
>>embeds its own user database, with the usernames and hashed passwords
>>stored as globally readable znodes in zookeeper.  You could say that
>>reliance on an external system makes the HBase setup more complex, but
>>that's a narrow view.  While managing an internal user database does
>>keep things self contained, it also forces you to create usernames and
>>passwords for an application in multiple places (your application does
>>run under its own account, right?), adding it's own complexity.
>>Accumulo allows access control labels to be placed on each key value
>>individually, while HBase uses a simpler model for assignments limited
>>to table, column family, or column qualifier scope.
>>
>>Each system makes it's own trade-offs based on its implementation
>>goals.  What's right for you is going to depend on your needs.  But
>>the HBase approach did not just disregard simplicity willy-nilly.
>>
>>> The ticket also talks about abstracting out some of the CP things to make it
>>> easier to add other top level features, which would be a win too. Yeah, they
>>> would be backed by CPs, but that doesn't mean it doesn't make sense for
>>> people to use the stuff really (as in dead simple) easily.
>>>
>>
>>Again, I don't really see the other changes discussed (HBASE-4554?) as
>>top-level vs. CP-based.  I think that change is just about providing
>>the shell with the ability to easily set arbitrary attributes on
>>HTableDescriptor.  Those already exist, they're just not properly
>>exposed in the shell.  Maybe you're envisioning something beyond this
>>for the constraints case?  That may be good too, but we should
>>probably move the discussion over to the JIRA.
>>
>>It may not sound like it, but I'm all in favor of making things as
>>simple as possible.  It's just that, when simplifying, you're usually
>>moving complexity from one place to another.  So let's work out where
>>we can get the biggest benefit.
>>
>>--gh
>>
>>
>>
>
>

Re: adding constraints

Posted by Andrew Purtell <ap...@apache.org>.
> > Yeah, we can do security, but you have to include
> > the jars make sure it works, etc. As opposed to _certain_ systems where
> > security is built in. Similar arguments can be made for things like
> > constraints - its just _easier_ to have it built in, and let people use them
> > (or not) as they choose.

> [...] But the current approach for security was
> arrived at as a result of extensive discussions with the entire
> community about the right approach, where many concerns were raised
> about paying any overhead for security when it was not being used.  As
> a result, all security components were built in a loadable fashion,
> with the trade-off of some extra configuration complexity

This discussion more than casually reminds me of past discussions regarding moving from a statically linked kernel to one that supports dynamically loaded modules seen on both the Linux and FreeBSD mailing lists.

Again we have a tightly coupled code base making a transition to dynamic runtime composition.

IMO, anyone concerned that HBase doesn't have security or constraints built in can ship a default configuration that has either or both loaded as system coprocessors. Those that don't want the "bloat" can simply not load them. This balances the demands we will see over a contiuum here, from those that want the most functionality "out of the box", to those that want maximum performance or minimal runtime complexity or both.

If there is sufficient concern about user-friendliness, those so concerned could build the plumbing to automatically load coprocessors a la modprobe. Perhaps by reading hbase-site.xml and matching config vars to CP jars (via reflection and some kind of decorator convention?).

I also see HBASE-4554 as about improving how CPs get configuration, if needed, and how the user can change it.

It looks like everyone is in favor of, or at least does not object to, some sort of constraint checking and enforcement implemented as a coprocessor, independent of the core code.

Personally, I have the same attitude about this as I did security -- it's great to have, and even better if it can be dynamically loaded only as needed so those that do not want it suffer no overheads or performance degradation.

   - Andy





>________________________________
>From: Gary Helmling <gh...@gmail.com>
>To: dev@hbase.apache.org
>Sent: Tuesday, October 18, 2011 12:43 AM
>Subject: Re: adding constraints
>
>>
>> There is an example of how to do Constraints as a jar with CPs already
>> attached to the ticket, and its pretty simple. However, the ticket goes into
>> the plusses and minuses for a top-level or just basic CP based
>> implementation.
>>
>> For me, the best reason for top level is top make HBase easy to use and have
>> certain built-in features.
>
>Hmm, I wasn't really reading the two implementation options for
>constraints as a choice between a "built-in" feature and CP based.
>I'm reading it as a choice between:
>1) a bundled CP implementation (which you still have to _enable_) that
>does constraint checking loading user classes that implement a simple
>interface (Constraint or Predicate<Put> or whatever)
>2) an abstract CP example class that you have to extend with your own
>implementation logic, which, if you want to do it right, you'll still
>wind up with something resembling #1 anyway
>
>FYI, I see option #1 as fairly analogous to the bundled aggregation
>client that Lars mentioned.
>
>If you want this as real top-level functionality built directly in to,
>say, the HRegion code paths for puts, the question is why should we
>add the complexity directly when we have CPs?
>
>> Yeah, we can do security, but you have to include
>> the jars make sure it works, etc. As opposed to _certain_ systems where
>> security is built in. Similar arguments can be made for things like
>> constraints - its just _easier_ to have it built in, and let people use them
>> (or not) as they choose.
>>
>
>We have a security implementation up for review that provides
>meaningful security.  Yes, it has to be enabled to be used and the
>process of configuring it could be much simpler.  Security is always a
>matter of trade-offs.  You can argue about about whether or not we've
>made the right ones.  But the current approach for security was
>arrived at as a result of extensive discussions with the entire
>community about the right approach, where many concerns were raised
>about paying any overhead for security when it was not being used.  As
>a result, all security components were built in a loadable fashion,
>with the trade-off of some extra configuration complexity.
>
>Yes, Accumulo has "security" always enabled.  But this is still not an
>apples-to-apples comparison.  HBase security relies on Kerberos to
>provide a trusted third part for strong authentication while never
>sending the password over the wire.  Accumulo sends username and
>password in plain text on the rpc connections.  As a result HBase
>relies on external systems for managing credentials, while Accumulo
>embeds its own user database, with the usernames and hashed passwords
>stored as globally readable znodes in zookeeper.  You could say that
>reliance on an external system makes the HBase setup more complex, but
>that's a narrow view.  While managing an internal user database does
>keep things self contained, it also forces you to create usernames and
>passwords for an application in multiple places (your application does
>run under its own account, right?), adding it's own complexity.
>Accumulo allows access control labels to be placed on each key value
>individually, while HBase uses a simpler model for assignments limited
>to table, column family, or column qualifier scope.
>
>Each system makes it's own trade-offs based on its implementation
>goals.  What's right for you is going to depend on your needs.  But
>the HBase approach did not just disregard simplicity willy-nilly.
>
>> The ticket also talks about abstracting out some of the CP things to make it
>> easier to add other top level features, which would be a win too. Yeah, they
>> would be backed by CPs, but that doesn't mean it doesn't make sense for
>> people to use the stuff really (as in dead simple) easily.
>>
>
>Again, I don't really see the other changes discussed (HBASE-4554?) as
>top-level vs. CP-based.  I think that change is just about providing
>the shell with the ability to easily set arbitrary attributes on
>HTableDescriptor.  Those already exist, they're just not properly
>exposed in the shell.  Maybe you're envisioning something beyond this
>for the constraints case?  That may be good too, but we should
>probably move the discussion over to the JIRA.
>
>It may not sound like it, but I'm all in favor of making things as
>simple as possible.  It's just that, when simplifying, you're usually
>moving complexity from one place to another.  So let's work out where
>we can get the biggest benefit.
>
>--gh
>
>
>

Re: adding constraints

Posted by Jesse Yates <je...@gmail.com>.
Thanks!

I think 4605 may need to be pulled under a blanket ticket for doing the
improvements like setting general properties/dynamic loading of modules.

-Jesse

On Wed, Oct 19, 2011 at 11:18 AM, Gary Helmling <gh...@gmail.com> wrote:

> >
> > Do you think you can put the core what we are agreeing on into 4605?  I
> want
> > to make sure we don't lose any of your comments
> >
>
> Sure I'll try to summarize in a comment on 4605.
>
> I think we'll need to open a new JIRA for the shell aspects of this as
> well, since it looks like 4554 is only handling directly setting a
> coprocessor and we really need something more general.
>

Re: adding constraints

Posted by Gary Helmling <gh...@gmail.com>.
>
> Do you think you can put the core what we are agreeing on into 4605?  I want
> to make sure we don't lose any of your comments
>

Sure I'll try to summarize in a comment on 4605.

I think we'll need to open a new JIRA for the shell aspects of this as
well, since it looks like 4554 is only handling directly setting a
coprocessor and we really need something more general.

Re: adding constraints

Posted by Jesse Yates <je...@gmail.com>.
On Tue, Oct 18, 2011 at 7:10 PM, Gary Helmling <gh...@gmail.com> wrote:

> >>
> >> Hmm, I wasn't really reading the two implementation options for
> >> constraints as a choice between a "built-in" feature and CP based.
> >>
> >
> > Either way it would be CP based, but the 'built-in' would just have some
> > 'nice' ways of adding things. In short, its a question of adding a method
> to
> > the HTD for addConstraint() to add a bunch of classes to be run by the
> > 'constraint CP'.
> >
>
> I think we're on the same page here (just the details to work out).
> But I think for most people on this list, saying "top level" or "built
> in" feature would imply something not CP based, so we should be
> careful about terminology.
>

Agreed.


>
> >
> > I feel like having the addConstraint() for a table is actually _less_
> > complexity. Not necessarily from the overall system perspective certainly
> > (you have to do a little abstraction and a couple more methods), but its
> not
> > that much more as it all centered around the HTD.
> >
>
> For a single case, yes, this is simpler.  But it shifts complexity
> from the exposed configuration into the HTD code.  What happens when
> we have 20 such cases?  HTD starts to become a bit of a mess with
> special casing for each.  I totally understand the motivation -- we
> did something similar with table "owners" in the patch for HBASE-3025.
>  But I'm starting to think we need to handle it differently there and
> here to keep things scalable.
>

Yeah, this can start to be a bit of a mess.


>
> I think we need to invert this, so that CPs can take ownership for
> adding their own configs to HTD, instead of making HTD take ownership
> for all.  Something like:
>
> HTableDescriptor htd = new HTableDescriptor(...);
> Constraints.add(htd, MyConstraintImpl.class);
> admin.createTable(htd);
>

+1 I like the idea. It also feels very 'hadoopy' (eg. input/output formats)


>
> I think this is the best way to keep the code extensibility scalable.
>
> We'd have to work out how exactly this integrates with the HBase
> shell.  But given that jruby gives us a dynamic language to work with,
> we should be able to figure something out.  I think making the shell
> more extensible is also an important part of this.  For HBASE-3025 we
> needed to add some shell commands, and there's not really a "loadable"
> way of doing so at the moment.
>

It will be interesting to see how that feature shakes out

Do you think you can put the core what we are agreeing on into 4605?  I want
to make sure we don't lose any of your comments

Thanks,
Jesse

Re: adding constraints

Posted by Gary Helmling <gh...@gmail.com>.
>>
>> Hmm, I wasn't really reading the two implementation options for
>> constraints as a choice between a "built-in" feature and CP based.
>>
>
> Either way it would be CP based, but the 'built-in' would just have some
> 'nice' ways of adding things. In short, its a question of adding a method to
> the HTD for addConstraint() to add a bunch of classes to be run by the
> 'constraint CP'.
>

I think we're on the same page here (just the details to work out).
But I think for most people on this list, saying "top level" or "built
in" feature would imply something not CP based, so we should be
careful about terminology.

>
> I feel like having the addConstraint() for a table is actually _less_
> complexity. Not necessarily from the overall system perspective certainly
> (you have to do a little abstraction and a couple more methods), but its not
> that much more as it all centered around the HTD.
>

For a single case, yes, this is simpler.  But it shifts complexity
from the exposed configuration into the HTD code.  What happens when
we have 20 such cases?  HTD starts to become a bit of a mess with
special casing for each.  I totally understand the motivation -- we
did something similar with table "owners" in the patch for HBASE-3025.
 But I'm starting to think we need to handle it differently there and
here to keep things scalable.

I think we need to invert this, so that CPs can take ownership for
adding their own configs to HTD, instead of making HTD take ownership
for all.  Something like:

HTableDescriptor htd = new HTableDescriptor(...);
Constraints.add(htd, MyConstraintImpl.class);
admin.createTable(htd);

I think this is the best way to keep the code extensibility scalable.

We'd have to work out how exactly this integrates with the HBase
shell.  But given that jruby gives us a dynamic language to work with,
we should be able to figure something out.  I think making the shell
more extensible is also an important part of this.  For HBASE-3025 we
needed to add some shell commands, and there's not really a "loadable"
way of doing so at the moment.

>
> What I'm concerned about is the configuration complexity - there are a ton
> of them and adding more starts to be crazy. HBase has already made some
> tradeoffs, but if we keep adding more and more configuration values, its
> going to be close to unusable to anyone that doesn't have serious knowledge
> about the system and how to configure it.
>

I completely agree that security has a long way to go here.  Some
configuration has to be there -- we need to know the principals for
the various services, keytab files for logins -- but the rest of the
config for the loadable security bits should really be just a single
setting.  I totally agree with the vision here.  We'll get there.

>
> Ok, clearly the main thread through all of this is I would like to make it
> easier to load/unload features.
>
> Constraints was something (a) I thought hbase could use, (b) would be doable
> pretty easily with CPs, and (c) would put us down the path of making hbase
> easier to run/setup for users. The latter goes for security, constraints,
> and other new/existing features.
>

Agree with all of this, and I appreciate that you're looking at
improving this stuff.  Configuration and operability is a critical
part of the user experience and we have a long way to go in
streamlining it.

--gh

Re: adding constraints

Posted by Jesse Yates <je...@gmail.com>.
Comments inline.

On Tue, Oct 18, 2011 at 12:43 AM, Gary Helmling <gh...@gmail.com> wrote:

> >
> > There is an example of how to do Constraints as a jar with CPs already
> > attached to the ticket, and its pretty simple. However, the ticket goes
> into
> > the plusses and minuses for a top-level or just basic CP based
> > implementation.
> >
> > For me, the best reason for top level is top make HBase easy to use and
> have
> > certain built-in features.
>
> Hmm, I wasn't really reading the two implementation options for
> constraints as a choice between a "built-in" feature and CP based.
>

Either way it would be CP based, but the 'built-in' would just have some
'nice' ways of adding things. In short, its a question of adding a method to
the HTD for addConstraint() to add a bunch of classes to be run by the
'constraint CP'.

I could theoretically see a situation where people would want to have the
constraint extend from some other class (due to legacy code), meaning
extending an existing CP is a little more of a pain.

So, yeah, it still looks the #1 (below), but its easier to use. And if you
don't want to enable constraints, don't add the constraint jar as the CP
list - no runtime slowdown and its still a bit similar to how security is
done.


> I'm reading it as a choice between:
> 1) a bundled CP implementation (which you still have to _enable_) that
> does constraint checking loading user classes that implement a simple
> interface (Constraint or Predicate<Put> or whatever)
> 2) an abstract CP example class that you have to extend with your own
> implementation logic, which, if you want to do it right, you'll still
> wind up with something resembling #1 anyway
>
> FYI, I see option #1 as fairly analogous to the bundled aggregation
> client that Lars mentioned.
>
> If you want this as real top-level functionality built directly in to,
> say, the HRegion code paths for puts, the question is why should we
> add the complexity directly when we have CPs?
>

I feel like having the addConstraint() for a table is actually _less_
complexity. Not necessarily from the overall system perspective certainly
(you have to do a little abstraction and a couple more methods), but its not
that much more as it all centered around the HTD.


>
> > Yeah, we can do security, but you have to include
> > the jars make sure it works, etc. As opposed to _certain_ systems where
> > security is built in. Similar arguments can be made for things like
> > constraints - its just _easier_ to have it built in, and let people use
> them
> > (or not) as they choose.
> >
>
> We have a security implementation up for review that provides
> meaningful security.  Yes, it has to be enabled to be used and the
> process of configuring it could be much simpler.  Security is always a
> matter of trade-offs.  You can argue about about whether or not we've
> made the right ones.  But the current approach for security was
> arrived at as a result of extensive discussions with the entire
> community about the right approach, where many concerns were raised
> about paying any overhead for security when it was not being used.  As
> a result, all security components were built in a loadable fashion,
> with the trade-off of some extra configuration complexity.
>
> Yes, Accumulo has "security" always enabled.  But this is still not an
> apples-to-apples comparison.  HBase security relies on Kerberos to
> provide a trusted third part for strong authentication while never
> sending the password over the wire.  Accumulo sends username and
> password in plain text on the rpc connections.  As a result HBase
> relies on external systems for managing credentials, while Accumulo
> embeds its own user database, with the usernames and hashed passwords
> stored as globally readable znodes in zookeeper.  You could say that
> reliance on an external system makes the HBase setup more complex, but
> that's a narrow view.  While managing an internal user database does
> keep things self contained, it also forces you to create usernames and
> passwords for an application in multiple places (your application does
> run under its own account, right?), adding it's own complexity.
> Accumulo allows access control labels to be placed on each key value
> individually, while HBase uses a simpler model for assignments limited
> to table, column family, or column qualifier scope.
>
> Each system makes it's own trade-offs based on its implementation
> goals.  What's right for you is going to depend on your needs.  But
> the HBase approach did not just disregard simplicity willy-nilly.
>

Sorry for bringing security up flippantly - clearly you guys have thought
about that a lot and i wasn't trying to imply that you hadn't. Yeah,
Accumulo has a different model (and clearly has its flaws) and is running in
an, arguably, very different environment (with different requirements) than
most people running hbase. I think it makes sense to not have security
impact performance by making it loadable. However, loading should be easy.

What I'm concerned about is the configuration complexity - there are a ton
of them and adding more starts to be crazy. HBase has already made some
tradeoffs, but if we keep adding more and more configuration values, its
going to be close to unusable to anyone that doesn't have serious knowledge
about the system and how to configure it.

I would rather make it dead simple for people looking at the main interface
calls (eg. "ok, here is where I add a coprocessor", "here is where I enable
security", "here is where I add a constraint", etc) rather than digging
through the conf and all you have to do is enable the constraintCP or the
secuity CP. Right now you just need to add just it to the list of
regionserver cps, but what if you have to just set a boolean? Lets go super
easy. Heck, security is getting its own module (HBASE-4336), so its
reasonable to think that we can include some configuration specific stuff to
support that.


> > The ticket also talks about abstracting out some of the CP things to make
> it
> > easier to add other top level features, which would be a win too. Yeah,
> they
> > would be backed by CPs, but that doesn't mean it doesn't make sense for
> > people to use the stuff really (as in dead simple) easily.
> >
>
> Again, I don't really see the other changes discussed (HBASE-4554?) as
> top-level vs. CP-based.  I think that change is just about providing
> the shell with the ability to easily set arbitrary attributes on
> HTableDescriptor.  Those already exist, they're just not properly
> exposed in the shell.  Maybe you're envisioning something beyond this
> for the constraints case?  That may be good too, but we should
> probably move the discussion over to the JIRA.
>
> It may not sound like it, but I'm all in favor of making things as
> simple as possible.  It's just that, when simplifying, you're usually
> moving complexity from one place to another.  So let's work out where
> we can get the biggest benefit.


> --gh
>

Ok, clearly the main thread through all of this is I would like to make it
easier to load/unload features.

Constraints was something (a) I thought hbase could use, (b) would be doable
pretty easily with CPs, and (c) would put us down the path of making hbase
easier to run/setup for users. The latter goes for security, constraints,
and other new/existing features.

-Jesse

Re: adding constraints

Posted by Gary Helmling <gh...@gmail.com>.
>
> There is an example of how to do Constraints as a jar with CPs already
> attached to the ticket, and its pretty simple. However, the ticket goes into
> the plusses and minuses for a top-level or just basic CP based
> implementation.
>
> For me, the best reason for top level is top make HBase easy to use and have
> certain built-in features.

Hmm, I wasn't really reading the two implementation options for
constraints as a choice between a "built-in" feature and CP based.
I'm reading it as a choice between:
1) a bundled CP implementation (which you still have to _enable_) that
does constraint checking loading user classes that implement a simple
interface (Constraint or Predicate<Put> or whatever)
2) an abstract CP example class that you have to extend with your own
implementation logic, which, if you want to do it right, you'll still
wind up with something resembling #1 anyway

FYI, I see option #1 as fairly analogous to the bundled aggregation
client that Lars mentioned.

If you want this as real top-level functionality built directly in to,
say, the HRegion code paths for puts, the question is why should we
add the complexity directly when we have CPs?

> Yeah, we can do security, but you have to include
> the jars make sure it works, etc. As opposed to _certain_ systems where
> security is built in. Similar arguments can be made for things like
> constraints - its just _easier_ to have it built in, and let people use them
> (or not) as they choose.
>

We have a security implementation up for review that provides
meaningful security.  Yes, it has to be enabled to be used and the
process of configuring it could be much simpler.  Security is always a
matter of trade-offs.  You can argue about about whether or not we've
made the right ones.  But the current approach for security was
arrived at as a result of extensive discussions with the entire
community about the right approach, where many concerns were raised
about paying any overhead for security when it was not being used.  As
a result, all security components were built in a loadable fashion,
with the trade-off of some extra configuration complexity.

Yes, Accumulo has "security" always enabled.  But this is still not an
apples-to-apples comparison.  HBase security relies on Kerberos to
provide a trusted third part for strong authentication while never
sending the password over the wire.  Accumulo sends username and
password in plain text on the rpc connections.  As a result HBase
relies on external systems for managing credentials, while Accumulo
embeds its own user database, with the usernames and hashed passwords
stored as globally readable znodes in zookeeper.  You could say that
reliance on an external system makes the HBase setup more complex, but
that's a narrow view.  While managing an internal user database does
keep things self contained, it also forces you to create usernames and
passwords for an application in multiple places (your application does
run under its own account, right?), adding it's own complexity.
Accumulo allows access control labels to be placed on each key value
individually, while HBase uses a simpler model for assignments limited
to table, column family, or column qualifier scope.

Each system makes it's own trade-offs based on its implementation
goals.  What's right for you is going to depend on your needs.  But
the HBase approach did not just disregard simplicity willy-nilly.

> The ticket also talks about abstracting out some of the CP things to make it
> easier to add other top level features, which would be a win too. Yeah, they
> would be backed by CPs, but that doesn't mean it doesn't make sense for
> people to use the stuff really (as in dead simple) easily.
>

Again, I don't really see the other changes discussed (HBASE-4554?) as
top-level vs. CP-based.  I think that change is just about providing
the shell with the ability to easily set arbitrary attributes on
HTableDescriptor.  Those already exist, they're just not properly
exposed in the shell.  Maybe you're envisioning something beyond this
for the constraints case?  That may be good too, but we should
probably move the discussion over to the JIRA.

It may not sound like it, but I'm all in favor of making things as
simple as possible.  It's just that, when simplifying, you're usually
moving complexity from one place to another.  So let's work out where
we can get the biggest benefit.

--gh

Re: adding constraints

Posted by Jesse Yates <je...@gmail.com>.
Yeah, in many large installations, turning off constraints makes a lot of
sense (do checking before you put the data over the wire, rather than server
side). However, on multi-tenant systems or where you are required to enforce
certain parameters (constraints) on the data no matter what, due to company
policy or w/e.

There is an example of how to do Constraints as a jar with CPs already
attached to the ticket, and its pretty simple. However, the ticket goes into
the plusses and minuses for a top-level or just basic CP based
implementation.

For me, the best reason for top level is top make HBase easy to use and have
certain built-in features. Yeah, we can do security, but you have to include
the jars make sure it works, etc. As opposed to _certain_ systems where
security is built in. Similar arguments can be made for things like
constraints - its just _easier_ to have it built in, and let people use them
(or not) as they choose.

The ticket also talks about abstracting out some of the CP things to make it
easier to add other top level features, which would be a win too. Yeah, they
would be backed by CPs, but that doesn't mean it doesn't make sense for
people to use the stuff really (as in dead simple) easily.

-Jesse

On Mon, Oct 17, 2011 at 10:00 PM, lars hofhansl <lh...@yahoo.com> wrote:

> My $0.02...
>
>
> I'd rather include an example of how to do this with a coprocessors
> (similar to what we do with the
> aggregation client), rather than a new HBase feature. If the example is
> easy to extend and to compile to a
> jar we have achieved almost the same.
>
>
> Also - as an anecdote - every semi large relational database I worked with
> professionally had constraints turned because
> of performance reasons and rather implemented constraints at the
> application layer.
>
>
> -- Lars
>
> ________________________________
> From: Jesse Yates <je...@gmail.com>
> To: dev@hbase.apache.org
> Sent: Monday, October 17, 2011 11:27 AM
> Subject: Re: adding constraints
>
> Added HBASE-4605 <https://issues.apache.org/jira/browse/HBASE-4605> (and
> approach comemnts) for this issue.
>
> -Jesse
>
> On Mon, Oct 17, 2011 at 11:10 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Jesse:
> > I agree with your observations.
> >
> > Constraint, defined for single table, would be useful.
> >
> > Please file a JIRA and describe your strategy there.
> >
> > Thanks
> >
> > On Mon, Oct 17, 2011 at 11:04 AM, Jesse Yates <jesse.k.yates@gmail.com
> > >wrote:
> >
> > > On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Jesse:
> > > > This is a nice initiative.
> > > > Looks like the Constraint you define below is per table. Meaning it
> is
> > > not
> > > > cross-table referential integrity.
> > > >
> > >
> > > Theoretically we could support doing this. And if people were really
> > cheeky
> > > with the current implementation, they could access other tables to
> > enforce
> > > it (though it would kill you on access time). Even so, doing the
> > > cross-table
> > > checks, is going to be rough on run time (cross-server locking is
> always
> > > bad
> > > news bears ;), so thinking this should definitely be a later
> > consideration.
> > >
> > >
> > > > Cheers
> > > >
> > > > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <
> jesse.k.yates@gmail.com
> > > > >wrote:
> > > >
> > > > > Hey everyone,
> > > > >
> > > > > TL;DR Adding classic DB constraints as a system level coprocessor
> to
> > > help
> > > > > simplify using HBase and ease adopting.
> > > > >
> > > > > Coprocessors are a really powerful mechanism and are incredibly
> > useful
> > > > for
> > > > > a
> > > > > variety of things. However, I feel like the mechanism for using
> them
> > > can
> > > > be
> > > > > very daunting and, for certain features, could do with some
> > > > simplification.
> > > > >
> > > > > What I would like to propose is a simple interface that people can
> > use
> > > to
> > > > > implement a 'constraint' (matching the classic database
> definition).
> > > This
> > > > > would help ease of adoption by helping HBase more easily check that
> > > box,
> > > > > help minimize code duplication across organizations, and lead to
> > easier
> > > > > adoption.
> > > > >
> > > > > Essentially, people would implement a 'Constraint' interface for
> > > checking
> > > > > keys before they are put into a table. Puts that are valid get
> > written
> > > to
> > > > > the table, but if not people can will throw an exception that gets
> > > > > propagated back to the client explaining why the put was invalid.
> > > > >
> > > > > Constraints would be set on a per-table basis and the user would be
> > > > > expected
> > > > > to ensure the jars containing the constraint are present on the
> > > machines
> > > > > serving that table.
> > > > >
> > > > > Yes, people could roll their own mechanism for doing this via
> > > > coprocessors
> > > > > each time, but this would make it easier to do so, so you only have
> > to
> > > > > implement a very minimal interface and not worry about the
> specifics.
> > > > >
> > > > > If people are interested, I would like to open a Jira on the
> feature.
> > > > I've
> > > > > got a basic implementation, but would like to expand it to be a
> more
> > > > > integrated, top-level element of the code. I just don't want to
> waste
> > > my
> > > > > time doing a full blown impl and then not have at least general
> > > concensus
> > > > > on
> > > > > it being a good feature.
> > > > >
> > > > > One of the complaints I commonly hear about HBase is that, to
> > > outsiders,
> > > > it
> > > > > is difficult to figure out and use (though once you do, its solid).
> > > This
> > > > > would be a step to make it easier to use and adopt.
> > > > >
> > > > > Thanks,
> > > > > Jesse Yates
> > > > >
> > > >
> > >
> >
>

Re: adding constraints

Posted by lars hofhansl <lh...@yahoo.com>.
My $0.02...


I'd rather include an example of how to do this with a coprocessors (similar to what we do with the
aggregation client), rather than a new HBase feature. If the example is easy to extend and to compile to a
jar we have achieved almost the same.


Also - as an anecdote - every semi large relational database I worked with professionally had constraints turned because
of performance reasons and rather implemented constraints at the application layer.


-- Lars

________________________________
From: Jesse Yates <je...@gmail.com>
To: dev@hbase.apache.org
Sent: Monday, October 17, 2011 11:27 AM
Subject: Re: adding constraints

Added HBASE-4605 <https://issues.apache.org/jira/browse/HBASE-4605> (and
approach comemnts) for this issue.

-Jesse

On Mon, Oct 17, 2011 at 11:10 AM, Ted Yu <yu...@gmail.com> wrote:

> Jesse:
> I agree with your observations.
>
> Constraint, defined for single table, would be useful.
>
> Please file a JIRA and describe your strategy there.
>
> Thanks
>
> On Mon, Oct 17, 2011 at 11:04 AM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Jesse:
> > > This is a nice initiative.
> > > Looks like the Constraint you define below is per table. Meaning it is
> > not
> > > cross-table referential integrity.
> > >
> >
> > Theoretically we could support doing this. And if people were really
> cheeky
> > with the current implementation, they could access other tables to
> enforce
> > it (though it would kill you on access time). Even so, doing the
> > cross-table
> > checks, is going to be rough on run time (cross-server locking is always
> > bad
> > news bears ;), so thinking this should definitely be a later
> consideration.
> >
> >
> > > Cheers
> > >
> > > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <jesse.k.yates@gmail.com
> > > >wrote:
> > >
> > > > Hey everyone,
> > > >
> > > > TL;DR Adding classic DB constraints as a system level coprocessor to
> > help
> > > > simplify using HBase and ease adopting.
> > > >
> > > > Coprocessors are a really powerful mechanism and are incredibly
> useful
> > > for
> > > > a
> > > > variety of things. However, I feel like the mechanism for using them
> > can
> > > be
> > > > very daunting and, for certain features, could do with some
> > > simplification.
> > > >
> > > > What I would like to propose is a simple interface that people can
> use
> > to
> > > > implement a 'constraint' (matching the classic database definition).
> > This
> > > > would help ease of adoption by helping HBase more easily check that
> > box,
> > > > help minimize code duplication across organizations, and lead to
> easier
> > > > adoption.
> > > >
> > > > Essentially, people would implement a 'Constraint' interface for
> > checking
> > > > keys before they are put into a table. Puts that are valid get
> written
> > to
> > > > the table, but if not people can will throw an exception that gets
> > > > propagated back to the client explaining why the put was invalid.
> > > >
> > > > Constraints would be set on a per-table basis and the user would be
> > > > expected
> > > > to ensure the jars containing the constraint are present on the
> > machines
> > > > serving that table.
> > > >
> > > > Yes, people could roll their own mechanism for doing this via
> > > coprocessors
> > > > each time, but this would make it easier to do so, so you only have
> to
> > > > implement a very minimal interface and not worry about the specifics.
> > > >
> > > > If people are interested, I would like to open a Jira on the feature.
> > > I've
> > > > got a basic implementation, but would like to expand it to be a more
> > > > integrated, top-level element of the code. I just don't want to waste
> > my
> > > > time doing a full blown impl and then not have at least general
> > concensus
> > > > on
> > > > it being a good feature.
> > > >
> > > > One of the complaints I commonly hear about HBase is that, to
> > outsiders,
> > > it
> > > > is difficult to figure out and use (though once you do, its solid).
> > This
> > > > would be a step to make it easier to use and adopt.
> > > >
> > > > Thanks,
> > > > Jesse Yates
> > > >
> > >
> >
>

Re: adding constraints

Posted by Jesse Yates <je...@gmail.com>.
Added HBASE-4605 <https://issues.apache.org/jira/browse/HBASE-4605> (and
approach comemnts) for this issue.

-Jesse

On Mon, Oct 17, 2011 at 11:10 AM, Ted Yu <yu...@gmail.com> wrote:

> Jesse:
> I agree with your observations.
>
> Constraint, defined for single table, would be useful.
>
> Please file a JIRA and describe your strategy there.
>
> Thanks
>
> On Mon, Oct 17, 2011 at 11:04 AM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Jesse:
> > > This is a nice initiative.
> > > Looks like the Constraint you define below is per table. Meaning it is
> > not
> > > cross-table referential integrity.
> > >
> >
> > Theoretically we could support doing this. And if people were really
> cheeky
> > with the current implementation, they could access other tables to
> enforce
> > it (though it would kill you on access time). Even so, doing the
> > cross-table
> > checks, is going to be rough on run time (cross-server locking is always
> > bad
> > news bears ;), so thinking this should definitely be a later
> consideration.
> >
> >
> > > Cheers
> > >
> > > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <jesse.k.yates@gmail.com
> > > >wrote:
> > >
> > > > Hey everyone,
> > > >
> > > > TL;DR Adding classic DB constraints as a system level coprocessor to
> > help
> > > > simplify using HBase and ease adopting.
> > > >
> > > > Coprocessors are a really powerful mechanism and are incredibly
> useful
> > > for
> > > > a
> > > > variety of things. However, I feel like the mechanism for using them
> > can
> > > be
> > > > very daunting and, for certain features, could do with some
> > > simplification.
> > > >
> > > > What I would like to propose is a simple interface that people can
> use
> > to
> > > > implement a 'constraint' (matching the classic database definition).
> > This
> > > > would help ease of adoption by helping HBase more easily check that
> > box,
> > > > help minimize code duplication across organizations, and lead to
> easier
> > > > adoption.
> > > >
> > > > Essentially, people would implement a 'Constraint' interface for
> > checking
> > > > keys before they are put into a table. Puts that are valid get
> written
> > to
> > > > the table, but if not people can will throw an exception that gets
> > > > propagated back to the client explaining why the put was invalid.
> > > >
> > > > Constraints would be set on a per-table basis and the user would be
> > > > expected
> > > > to ensure the jars containing the constraint are present on the
> > machines
> > > > serving that table.
> > > >
> > > > Yes, people could roll their own mechanism for doing this via
> > > coprocessors
> > > > each time, but this would make it easier to do so, so you only have
> to
> > > > implement a very minimal interface and not worry about the specifics.
> > > >
> > > > If people are interested, I would like to open a Jira on the feature.
> > > I've
> > > > got a basic implementation, but would like to expand it to be a more
> > > > integrated, top-level element of the code. I just don't want to waste
> > my
> > > > time doing a full blown impl and then not have at least general
> > concensus
> > > > on
> > > > it being a good feature.
> > > >
> > > > One of the complaints I commonly hear about HBase is that, to
> > outsiders,
> > > it
> > > > is difficult to figure out and use (though once you do, its solid).
> > This
> > > > would be a step to make it easier to use and adopt.
> > > >
> > > > Thanks,
> > > > Jesse Yates
> > > >
> > >
> >
>

Re: adding constraints

Posted by Ted Yu <yu...@gmail.com>.
Jesse:
I agree with your observations.

Constraint, defined for single table, would be useful.

Please file a JIRA and describe your strategy there.

Thanks

On Mon, Oct 17, 2011 at 11:04 AM, Jesse Yates <je...@gmail.com>wrote:

> On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Jesse:
> > This is a nice initiative.
> > Looks like the Constraint you define below is per table. Meaning it is
> not
> > cross-table referential integrity.
> >
>
> Theoretically we could support doing this. And if people were really cheeky
> with the current implementation, they could access other tables to enforce
> it (though it would kill you on access time). Even so, doing the
> cross-table
> checks, is going to be rough on run time (cross-server locking is always
> bad
> news bears ;), so thinking this should definitely be a later consideration.
>
>
> > Cheers
> >
> > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <jesse.k.yates@gmail.com
> > >wrote:
> >
> > > Hey everyone,
> > >
> > > TL;DR Adding classic DB constraints as a system level coprocessor to
> help
> > > simplify using HBase and ease adopting.
> > >
> > > Coprocessors are a really powerful mechanism and are incredibly useful
> > for
> > > a
> > > variety of things. However, I feel like the mechanism for using them
> can
> > be
> > > very daunting and, for certain features, could do with some
> > simplification.
> > >
> > > What I would like to propose is a simple interface that people can use
> to
> > > implement a 'constraint' (matching the classic database definition).
> This
> > > would help ease of adoption by helping HBase more easily check that
> box,
> > > help minimize code duplication across organizations, and lead to easier
> > > adoption.
> > >
> > > Essentially, people would implement a 'Constraint' interface for
> checking
> > > keys before they are put into a table. Puts that are valid get written
> to
> > > the table, but if not people can will throw an exception that gets
> > > propagated back to the client explaining why the put was invalid.
> > >
> > > Constraints would be set on a per-table basis and the user would be
> > > expected
> > > to ensure the jars containing the constraint are present on the
> machines
> > > serving that table.
> > >
> > > Yes, people could roll their own mechanism for doing this via
> > coprocessors
> > > each time, but this would make it easier to do so, so you only have to
> > > implement a very minimal interface and not worry about the specifics.
> > >
> > > If people are interested, I would like to open a Jira on the feature.
> > I've
> > > got a basic implementation, but would like to expand it to be a more
> > > integrated, top-level element of the code. I just don't want to waste
> my
> > > time doing a full blown impl and then not have at least general
> concensus
> > > on
> > > it being a good feature.
> > >
> > > One of the complaints I commonly hear about HBase is that, to
> outsiders,
> > it
> > > is difficult to figure out and use (though once you do, its solid).
> This
> > > would be a step to make it easier to use and adopt.
> > >
> > > Thanks,
> > > Jesse Yates
> > >
> >
>

Re: adding constraints

Posted by Jesse Yates <je...@gmail.com>.
On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <yu...@gmail.com> wrote:

> Jesse:
> This is a nice initiative.
> Looks like the Constraint you define below is per table. Meaning it is not
> cross-table referential integrity.
>

Theoretically we could support doing this. And if people were really cheeky
with the current implementation, they could access other tables to enforce
it (though it would kill you on access time). Even so, doing the cross-table
checks, is going to be rough on run time (cross-server locking is always bad
news bears ;), so thinking this should definitely be a later consideration.


> Cheers
>
> On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Hey everyone,
> >
> > TL;DR Adding classic DB constraints as a system level coprocessor to help
> > simplify using HBase and ease adopting.
> >
> > Coprocessors are a really powerful mechanism and are incredibly useful
> for
> > a
> > variety of things. However, I feel like the mechanism for using them can
> be
> > very daunting and, for certain features, could do with some
> simplification.
> >
> > What I would like to propose is a simple interface that people can use to
> > implement a 'constraint' (matching the classic database definition). This
> > would help ease of adoption by helping HBase more easily check that box,
> > help minimize code duplication across organizations, and lead to easier
> > adoption.
> >
> > Essentially, people would implement a 'Constraint' interface for checking
> > keys before they are put into a table. Puts that are valid get written to
> > the table, but if not people can will throw an exception that gets
> > propagated back to the client explaining why the put was invalid.
> >
> > Constraints would be set on a per-table basis and the user would be
> > expected
> > to ensure the jars containing the constraint are present on the machines
> > serving that table.
> >
> > Yes, people could roll their own mechanism for doing this via
> coprocessors
> > each time, but this would make it easier to do so, so you only have to
> > implement a very minimal interface and not worry about the specifics.
> >
> > If people are interested, I would like to open a Jira on the feature.
> I've
> > got a basic implementation, but would like to expand it to be a more
> > integrated, top-level element of the code. I just don't want to waste my
> > time doing a full blown impl and then not have at least general concensus
> > on
> > it being a good feature.
> >
> > One of the complaints I commonly hear about HBase is that, to outsiders,
> it
> > is difficult to figure out and use (though once you do, its solid). This
> > would be a step to make it easier to use and adopt.
> >
> > Thanks,
> > Jesse Yates
> >
>

Re: adding constraints

Posted by Ted Yu <yu...@gmail.com>.
Jesse:
This is a nice initiative.
Looks like the Constraint you define below is per table. Meaning it is not
cross-table referential integrity.

Cheers

On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <je...@gmail.com>wrote:

> Hey everyone,
>
> TL;DR Adding classic DB constraints as a system level coprocessor to help
> simplify using HBase and ease adopting.
>
> Coprocessors are a really powerful mechanism and are incredibly useful for
> a
> variety of things. However, I feel like the mechanism for using them can be
> very daunting and, for certain features, could do with some simplification.
>
> What I would like to propose is a simple interface that people can use to
> implement a 'constraint' (matching the classic database definition). This
> would help ease of adoption by helping HBase more easily check that box,
> help minimize code duplication across organizations, and lead to easier
> adoption.
>
> Essentially, people would implement a 'Constraint' interface for checking
> keys before they are put into a table. Puts that are valid get written to
> the table, but if not people can will throw an exception that gets
> propagated back to the client explaining why the put was invalid.
>
> Constraints would be set on a per-table basis and the user would be
> expected
> to ensure the jars containing the constraint are present on the machines
> serving that table.
>
> Yes, people could roll their own mechanism for doing this via coprocessors
> each time, but this would make it easier to do so, so you only have to
> implement a very minimal interface and not worry about the specifics.
>
> If people are interested, I would like to open a Jira on the feature. I've
> got a basic implementation, but would like to expand it to be a more
> integrated, top-level element of the code. I just don't want to waste my
> time doing a full blown impl and then not have at least general concensus
> on
> it being a good feature.
>
> One of the complaints I commonly hear about HBase is that, to outsiders, it
> is difficult to figure out and use (though once you do, its solid). This
> would be a step to make it easier to use and adopt.
>
> Thanks,
> Jesse Yates
>