You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Josh Elser <el...@apache.org> on 2016/10/28 16:00:47 UTC

[DISCUSS] FileSystem Quotas in HBase

Hi folks,

I'd like to propose the introduction of FileSystem quotas to HBase.

Here's a design doc[1] available which (hopefully) covers all of the 
salient points of what I think an initial version of such a feature 
would include.

tl;dr We can define quotas on tables and namespaces. Region size is 
computed by RegionServers and sent to the Master. The Master inspects 
the sizes of Regions, rolling up to table and namespace sizes. Defined 
quotas in the quota table are evaluated given the computed sizes, and, 
for those tables/namespaces violating the quota, RegionServers are 
informed to take some action to limit any further filesystem growth by 
that table/namespace.

I'd encourage you to give the document a read -- I tried to cover as 
much as I could without getting unnecessarily bogged down in 
implementation details.

Feedback is, of course, welcomed. I'd like to start sketching out a 
breakdown of the work (all writing and no programming makes Josh a sad 
boy). I'm happy to field any/all questions. Thanks in advance.

- Josh

[1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApacheHBase.pdf

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Josh Elser <el...@apache.org>.
Great points. Thanks fellas!

Andrew Purtell wrote:
> Right, these drawbacks are what I was getting at with "\u200bimpose limits on
> how HBase structures storage on the filesystem". It does imply major
> changes to filesystem structure and multiple WALs. I didn't think of
> snapshots, you are right that makes filesystem reorganization more
> complicated. To your earlier point I think soft quotas are a fine start as
> well.
>
> On Wed, Nov 2, 2016 at 3:28 PM, Enis S�ztutar<en...@gmail.com>  wrote:
>
>> Thanks Andrew,
>>
>> I forgot to mention that we have considered using the HDFS quota
>> enforcement directly as well, but decided against it for a couple of
>> reasons.
>>   - Our current layout has files in the data directory, as well as archive
>> directory and WALs, etc. Since there is no option for HDFS quotas to span
>> multiple directories, we can only use the HDFS quotas for main data files,
>> and not snapshots, etc unless we do major surgery in our file layouts. This
>> will get more complicated if we want to do flat layout, etc later on.
>>   - Since WALs would not be in any namespace unless we do wal-per-namespace,
>> that means that once a single NS's HDFS quota is reached, it might affect
>> everybody else and potentially cause havoc on the cluster. The problem
>> would be that if a single NS is out of space, we cannot perform flushes at
>> all. This would cause the WALs to be backed up and kept forever and affect
>> all of the other regions from different tables / namespaces causing
>> unavailability for unrelated tables. Wal-per-namespace also has to be
>> implemented and WALs be moved under a shared NS directory to share the data
>> and WAL requiring further layout changes. It also will not be optimal if
>> there is a large number of namespaces.
>>   - Will only work with HDFS, while HBase can use other file systems.
>>
>> Enis
>>
>> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell<ap...@apache.org>
>> wrote:
>>
>>> Another approach to hard limits could be pushing the quota down to the
>> HDFS
>>> level, because HDFS would have a very accurate assessment of quota
>>> utilization at all times, but this would only work with HDFS and
>> \u200b\u200b
>> impose
>>> limits on how HBase structures storage on the filesystem (e.g. all files
>>> for a namespace must be under a common root). Still, implementation would
>>> be "easy": over hard quota, all allocations would fail, the bulk of the
>>> effort is hardening response to allocation failures.
>>>
>>> On Wed, Nov 2, 2016 at 1:11 PM, Enis S�ztutar<en...@apache.org>  wrote:
>>>
>>>> Thanks Josh for the doc and pursuing this.
>>>>
>>>> I was involved with some of the design choices so consider me a +1 on
>> the
>>>> general approach. One topic which is not covered here is that the other
>>>> design decision that we could have pursued is a more strict control on
>>> the
>>>> quota usage so that we would always guarantee that the namespace /
>> table
>>>> cannot use more than allocated disk space. This hard-limit approach
>> would
>>>> differ from the proposed "soft-limit" approach because the soft limit
>>>> approach can end up overusing the disk space by a small amount (because
>>> it
>>>> takes time to detect the quota limit is reached and enforcing of the
>>>> limit).
>>>>
>>>> The hard-limit approach maybe built by doing a lease kind of mechanism
>>>> where the master gives away disk space leases to region servers from
>> the
>>>> remaining limit, and the regionservers make sure that they cannot
>>> allocate
>>>> more space than the lease dictates. By ensuring that the space is
>>>> pre-allocated via leases, we can always make sure that strict limits
>> are
>>>> applied. Though, this approach would be harder to build and stabilize
>>>> because it will need new mechanisms for distributing and managing this
>>> kind
>>>> of leases as well as tuning the allocations to make sure that
>>> regionservers
>>>> never block flushes or compactions due to lack of lease in time would
>>> prove
>>>> challenging to get it right.
>>>>
>>>> We generally think that the "soft-limit" approach would be a good
>> enough
>>>> approximation and the error bounds on over-allocation would be minimal
>>> and
>>>> negligible in production.  Thus, the proposal is to implement the soft
>>>> approach with good documentation about how much space can be
>>> over-allocated
>>>> in a worst-case scenario.
>>>>
>>>> Enis
>>>>
>>>> On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser<el...@apache.org>  wrote:
>>>>
>>>>> Thanks for the reviews so far, Ted and Stack. The comments were great
>>> and
>>>>> much appreciated.
>>>>>
>>>>> Interpreting consensus from lack of objection, I'm going to move
>> ahead
>>> in
>>>>> earnest starting to work on what was described in the doc. Expect to
>>> see
>>>>> some work break-out happening under HBASE-16961 and patches starting
>> to
>>>>> land.
>>>>>
>>>>> I'm also happy to entertain more discussion if anyone hasn't found
>> the
>>>>> time to read/comment yet.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> - Josh
>>>>>
>>>>>
>>>>> Josh Elser wrote:
>>>>>
>>>>>> Sure thing, Ted.
>>>>>>
>>>>>> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
>>>>>> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
>>>>>>
>>>>>>
>>>>>> Let me open an umbrella issue for now. I can break up the work
>> later.
>>>>>> https://issues.apache.org/jira/browse/HBASE-16961
>>>>>>
>>>>>> Ted Yu wrote:
>>>>>>
>>>>>>> Josh:
>>>>>>> Can you put the doc in google doc so that people can comment on it
>> ?
>>>>>>> Is there a JIRA opened for this work ?
>>>>>>> Please open one if there is none.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org>
>>> wrote:
>>>>>>> Hi folks,
>>>>>>>> I'd like to propose the introduction of FileSystem quotas to
>> HBase.
>>>>>>>> Here's a design doc[1] available which (hopefully) covers all of
>> the
>>>>>>>> salient points of what I think an initial version of such a
>> feature
>>>>>>>> would
>>>>>>>> include.
>>>>>>>>
>>>>>>>> tl;dr We can define quotas on tables and namespaces. Region size
>> is
>>>>>>>> computed by RegionServers and sent to the Master. The Master
>>> inspects
>>>>>>>> the
>>>>>>>> sizes of Regions, rolling up to table and namespace sizes. Defined
>>>>>>>> quotas
>>>>>>>> in the quota table are evaluated given the computed sizes, and,
>> for
>>>>>>>> those
>>>>>>>> tables/namespaces violating the quota, RegionServers are informed
>> to
>>>>>>>> take
>>>>>>>> some action to limit any further filesystem growth by that
>>>>>>>> table/namespace.
>>>>>>>>
>>>>>>>> I'd encourage you to give the document a read -- I tried to cover
>> as
>>>>>>>> much
>>>>>>>> as I could without getting unnecessarily bogged down in
>>> implementation
>>>>>>>> details.
>>>>>>>>
>>>>>>>> Feedback is, of course, welcomed. I'd like to start sketching out
>> a
>>>>>>>> breakdown of the work (all writing and no programming makes Josh a
>>> sad
>>>>>>>> boy). I'm happy to field any/all questions. Thanks in advance.
>>>>>>>>
>>>>>>>> - Josh
>>>>>>>>
>>>>>>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
>>>>>>>> heHBase.pdf
>>>>>>>>
>>>>>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>>     - Andy
>>>
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>>
>
>
>

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Andrew Purtell <ap...@apache.org>.
Right, these drawbacks are what I was getting at with "​impose limits on
how HBase structures storage on the filesystem". It does imply major
changes to filesystem structure and multiple WALs. I didn't think of
snapshots, you are right that makes filesystem reorganization more
complicated. To your earlier point I think soft quotas are a fine start as
well.

On Wed, Nov 2, 2016 at 3:28 PM, Enis Söztutar <en...@gmail.com> wrote:

> Thanks Andrew,
>
> I forgot to mention that we have considered using the HDFS quota
> enforcement directly as well, but decided against it for a couple of
> reasons.
>  - Our current layout has files in the data directory, as well as archive
> directory and WALs, etc. Since there is no option for HDFS quotas to span
> multiple directories, we can only use the HDFS quotas for main data files,
> and not snapshots, etc unless we do major surgery in our file layouts. This
> will get more complicated if we want to do flat layout, etc later on.
>  - Since WALs would not be in any namespace unless we do wal-per-namespace,
> that means that once a single NS's HDFS quota is reached, it might affect
> everybody else and potentially cause havoc on the cluster. The problem
> would be that if a single NS is out of space, we cannot perform flushes at
> all. This would cause the WALs to be backed up and kept forever and affect
> all of the other regions from different tables / namespaces causing
> unavailability for unrelated tables. Wal-per-namespace also has to be
> implemented and WALs be moved under a shared NS directory to share the data
> and WAL requiring further layout changes. It also will not be optimal if
> there is a large number of namespaces.
>  - Will only work with HDFS, while HBase can use other file systems.
>
> Enis
>
> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > Another approach to hard limits could be pushing the quota down to the
> HDFS
> > level, because HDFS would have a very accurate assessment of quota
> > utilization at all times, but this would only work with HDFS and
> ​​
> impose
> > limits on how HBase structures storage on the filesystem (e.g. all files
> > for a namespace must be under a common root). Still, implementation would
> > be "easy": over hard quota, all allocations would fail, the bulk of the
> > effort is hardening response to allocation failures.
> >
> > On Wed, Nov 2, 2016 at 1:11 PM, Enis Söztutar <en...@apache.org> wrote:
> >
> > > Thanks Josh for the doc and pursuing this.
> > >
> > > I was involved with some of the design choices so consider me a +1 on
> the
> > > general approach. One topic which is not covered here is that the other
> > > design decision that we could have pursued is a more strict control on
> > the
> > > quota usage so that we would always guarantee that the namespace /
> table
> > > cannot use more than allocated disk space. This hard-limit approach
> would
> > > differ from the proposed "soft-limit" approach because the soft limit
> > > approach can end up overusing the disk space by a small amount (because
> > it
> > > takes time to detect the quota limit is reached and enforcing of the
> > > limit).
> > >
> > > The hard-limit approach maybe built by doing a lease kind of mechanism
> > > where the master gives away disk space leases to region servers from
> the
> > > remaining limit, and the regionservers make sure that they cannot
> > allocate
> > > more space than the lease dictates. By ensuring that the space is
> > > pre-allocated via leases, we can always make sure that strict limits
> are
> > > applied. Though, this approach would be harder to build and stabilize
> > > because it will need new mechanisms for distributing and managing this
> > kind
> > > of leases as well as tuning the allocations to make sure that
> > regionservers
> > > never block flushes or compactions due to lack of lease in time would
> > prove
> > > challenging to get it right.
> > >
> > > We generally think that the "soft-limit" approach would be a good
> enough
> > > approximation and the error bounds on over-allocation would be minimal
> > and
> > > negligible in production.  Thus, the proposal is to implement the soft
> > > approach with good documentation about how much space can be
> > over-allocated
> > > in a worst-case scenario.
> > >
> > > Enis
> > >
> > > On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser <el...@apache.org> wrote:
> > >
> > > > Thanks for the reviews so far, Ted and Stack. The comments were great
> > and
> > > > much appreciated.
> > > >
> > > > Interpreting consensus from lack of objection, I'm going to move
> ahead
> > in
> > > > earnest starting to work on what was described in the doc. Expect to
> > see
> > > > some work break-out happening under HBASE-16961 and patches starting
> to
> > > > land.
> > > >
> > > > I'm also happy to entertain more discussion if anyone hasn't found
> the
> > > > time to read/comment yet.
> > > >
> > > > Thanks!
> > > >
> > > > - Josh
> > > >
> > > >
> > > > Josh Elser wrote:
> > > >
> > > >> Sure thing, Ted.
> > > >>
> > > >> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
> > > >> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
> > > >>
> > > >>
> > > >> Let me open an umbrella issue for now. I can break up the work
> later.
> > > >>
> > > >> https://issues.apache.org/jira/browse/HBASE-16961
> > > >>
> > > >> Ted Yu wrote:
> > > >>
> > > >>> Josh:
> > > >>> Can you put the doc in google doc so that people can comment on it
> ?
> > > >>>
> > > >>> Is there a JIRA opened for this work ?
> > > >>> Please open one if there is none.
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org>
> > wrote:
> > > >>>
> > > >>> Hi folks,
> > > >>>>
> > > >>>> I'd like to propose the introduction of FileSystem quotas to
> HBase.
> > > >>>>
> > > >>>> Here's a design doc[1] available which (hopefully) covers all of
> the
> > > >>>> salient points of what I think an initial version of such a
> feature
> > > >>>> would
> > > >>>> include.
> > > >>>>
> > > >>>> tl;dr We can define quotas on tables and namespaces. Region size
> is
> > > >>>> computed by RegionServers and sent to the Master. The Master
> > inspects
> > > >>>> the
> > > >>>> sizes of Regions, rolling up to table and namespace sizes. Defined
> > > >>>> quotas
> > > >>>> in the quota table are evaluated given the computed sizes, and,
> for
> > > >>>> those
> > > >>>> tables/namespaces violating the quota, RegionServers are informed
> to
> > > >>>> take
> > > >>>> some action to limit any further filesystem growth by that
> > > >>>> table/namespace.
> > > >>>>
> > > >>>> I'd encourage you to give the document a read -- I tried to cover
> as
> > > >>>> much
> > > >>>> as I could without getting unnecessarily bogged down in
> > implementation
> > > >>>> details.
> > > >>>>
> > > >>>> Feedback is, of course, welcomed. I'd like to start sketching out
> a
> > > >>>> breakdown of the work (all writing and no programming makes Josh a
> > sad
> > > >>>> boy). I'm happy to field any/all questions. Thanks in advance.
> > > >>>>
> > > >>>> - Josh
> > > >>>>
> > > >>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
> > > >>>> heHBase.pdf
> > > >>>>
> > > >>>>
> > > >>>
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Ted Yu <yu...@gmail.com>.
Thanks, Josh.

Looking forward to the patches.

On Thu, Nov 3, 2016 at 12:58 PM, Josh Elser <el...@apache.org> wrote:

> Done.
>
> Ted Yu wrote:
>
>> Josh:
>> Please capture the following in design doc.
>>
>> Thanks
>>
>> On Wed, Nov 2, 2016 at 3:28 PM, Enis Söztutar<en...@gmail.com>  wrote:
>>
>> Thanks Andrew,
>>>
>>> I forgot to mention that we have considered using the HDFS quota
>>> enforcement directly as well, but decided against it for a couple of
>>> reasons.
>>>   - Our current layout has files in the data directory, as well as
>>> archive
>>> directory and WALs, etc. Since there is no option for HDFS quotas to span
>>> multiple directories, we can only use the HDFS quotas for main data
>>> files,
>>> and not snapshots, etc unless we do major surgery in our file layouts.
>>> This
>>> will get more complicated if we want to do flat layout, etc later on.
>>>   - Since WALs would not be in any namespace unless we do
>>> wal-per-namespace,
>>> that means that once a single NS's HDFS quota is reached, it might affect
>>> everybody else and potentially cause havoc on the cluster. The problem
>>> would be that if a single NS is out of space, we cannot perform flushes
>>> at
>>> all. This would cause the WALs to be backed up and kept forever and
>>> affect
>>> all of the other regions from different tables / namespaces causing
>>> unavailability for unrelated tables. Wal-per-namespace also has to be
>>> implemented and WALs be moved under a shared NS directory to share the
>>> data
>>> and WAL requiring further layout changes. It also will not be optimal if
>>> there is a large number of namespaces.
>>>   - Will only work with HDFS, while HBase can use other file systems.
>>>
>>> Enis
>>>
>>> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell<ap...@apache.org>
>>> wrote:
>>>
>>> Another approach to hard limits could be pushing the quota down to the
>>>>
>>> HDFS
>>>
>>>> level, because HDFS would have a very accurate assessment of quota
>>>> utilization at all times, but this would only work with HDFS and impose
>>>> limits on how HBase structures storage on the filesystem (e.g. all files
>>>> for a namespace must be under a common root). Still, implementation
>>>> would
>>>> be "easy": over hard quota, all allocations would fail, the bulk of the
>>>> effort is hardening response to allocation failures.
>>>>
>>>> On Wed, Nov 2, 2016 at 1:11 PM, Enis Söztutar<en...@apache.org>  wrote:
>>>>
>>>> Thanks Josh for the doc and pursuing this.
>>>>>
>>>>> I was involved with some of the design choices so consider me a +1 on
>>>>>
>>>> the
>>>
>>>> general approach. One topic which is not covered here is that the other
>>>>> design decision that we could have pursued is a more strict control on
>>>>>
>>>> the
>>>>
>>>>> quota usage so that we would always guarantee that the namespace /
>>>>>
>>>> table
>>>
>>>> cannot use more than allocated disk space. This hard-limit approach
>>>>>
>>>> would
>>>
>>>> differ from the proposed "soft-limit" approach because the soft limit
>>>>> approach can end up overusing the disk space by a small amount (because
>>>>>
>>>> it
>>>>
>>>>> takes time to detect the quota limit is reached and enforcing of the
>>>>> limit).
>>>>>
>>>>> The hard-limit approach maybe built by doing a lease kind of mechanism
>>>>> where the master gives away disk space leases to region servers from
>>>>>
>>>> the
>>>
>>>> remaining limit, and the regionservers make sure that they cannot
>>>>>
>>>> allocate
>>>>
>>>>> more space than the lease dictates. By ensuring that the space is
>>>>> pre-allocated via leases, we can always make sure that strict limits
>>>>>
>>>> are
>>>
>>>> applied. Though, this approach would be harder to build and stabilize
>>>>> because it will need new mechanisms for distributing and managing this
>>>>>
>>>> kind
>>>>
>>>>> of leases as well as tuning the allocations to make sure that
>>>>>
>>>> regionservers
>>>>
>>>>> never block flushes or compactions due to lack of lease in time would
>>>>>
>>>> prove
>>>>
>>>>> challenging to get it right.
>>>>>
>>>>> We generally think that the "soft-limit" approach would be a good
>>>>>
>>>> enough
>>>
>>>> approximation and the error bounds on over-allocation would be minimal
>>>>>
>>>> and
>>>>
>>>>> negligible in production.  Thus, the proposal is to implement the soft
>>>>> approach with good documentation about how much space can be
>>>>>
>>>> over-allocated
>>>>
>>>>> in a worst-case scenario.
>>>>>
>>>>> Enis
>>>>>
>>>>> On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser<el...@apache.org>  wrote:
>>>>>
>>>>> Thanks for the reviews so far, Ted and Stack. The comments were great
>>>>>>
>>>>> and
>>>>
>>>>> much appreciated.
>>>>>>
>>>>>> Interpreting consensus from lack of objection, I'm going to move
>>>>>>
>>>>> ahead
>>>
>>>> in
>>>>
>>>>> earnest starting to work on what was described in the doc. Expect to
>>>>>>
>>>>> see
>>>>
>>>>> some work break-out happening under HBASE-16961 and patches starting
>>>>>>
>>>>> to
>>>
>>>> land.
>>>>>>
>>>>>> I'm also happy to entertain more discussion if anyone hasn't found
>>>>>>
>>>>> the
>>>
>>>> time to read/comment yet.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> - Josh
>>>>>>
>>>>>>
>>>>>> Josh Elser wrote:
>>>>>>
>>>>>> Sure thing, Ted.
>>>>>>>
>>>>>>> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
>>>>>>> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
>>>>>>>
>>>>>>>
>>>>>>> Let me open an umbrella issue for now. I can break up the work
>>>>>>>
>>>>>> later.
>>>
>>>> https://issues.apache.org/jira/browse/HBASE-16961
>>>>>>>
>>>>>>> Ted Yu wrote:
>>>>>>>
>>>>>>> Josh:
>>>>>>>> Can you put the doc in google doc so that people can comment on it
>>>>>>>>
>>>>>>> ?
>>>
>>>> Is there a JIRA opened for this work ?
>>>>>>>> Please open one if there is none.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org>
>>>>>>>>
>>>>>>> wrote:
>>>>
>>>>> Hi folks,
>>>>>>>>
>>>>>>>>> I'd like to propose the introduction of FileSystem quotas to
>>>>>>>>>
>>>>>>>> HBase.
>>>
>>>> Here's a design doc[1] available which (hopefully) covers all of
>>>>>>>>>
>>>>>>>> the
>>>
>>>> salient points of what I think an initial version of such a
>>>>>>>>>
>>>>>>>> feature
>>>
>>>> would
>>>>>>>>> include.
>>>>>>>>>
>>>>>>>>> tl;dr We can define quotas on tables and namespaces. Region size
>>>>>>>>>
>>>>>>>> is
>>>
>>>> computed by RegionServers and sent to the Master. The Master
>>>>>>>>>
>>>>>>>> inspects
>>>>
>>>>> the
>>>>>>>>> sizes of Regions, rolling up to table and namespace sizes. Defined
>>>>>>>>> quotas
>>>>>>>>> in the quota table are evaluated given the computed sizes, and,
>>>>>>>>>
>>>>>>>> for
>>>
>>>> those
>>>>>>>>> tables/namespaces violating the quota, RegionServers are informed
>>>>>>>>>
>>>>>>>> to
>>>
>>>> take
>>>>>>>>> some action to limit any further filesystem growth by that
>>>>>>>>> table/namespace.
>>>>>>>>>
>>>>>>>>> I'd encourage you to give the document a read -- I tried to cover
>>>>>>>>>
>>>>>>>> as
>>>
>>>> much
>>>>>>>>> as I could without getting unnecessarily bogged down in
>>>>>>>>>
>>>>>>>> implementation
>>>>
>>>>> details.
>>>>>>>>>
>>>>>>>>> Feedback is, of course, welcomed. I'd like to start sketching out
>>>>>>>>>
>>>>>>>> a
>>>
>>>> breakdown of the work (all writing and no programming makes Josh a
>>>>>>>>>
>>>>>>>> sad
>>>>
>>>>> boy). I'm happy to field any/all questions. Thanks in advance.
>>>>>>>>>
>>>>>>>>> - Josh
>>>>>>>>>
>>>>>>>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
>>>>>>>>> heHBase.pdf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>>     - Andy
>>>>
>>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>>> (via Tom White)
>>>>
>>>>
>>

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Josh Elser <el...@apache.org>.
Done.

Ted Yu wrote:
> Josh:
> Please capture the following in design doc.
>
> Thanks
>
> On Wed, Nov 2, 2016 at 3:28 PM, Enis S�ztutar<en...@gmail.com>  wrote:
>
>> Thanks Andrew,
>>
>> I forgot to mention that we have considered using the HDFS quota
>> enforcement directly as well, but decided against it for a couple of
>> reasons.
>>   - Our current layout has files in the data directory, as well as archive
>> directory and WALs, etc. Since there is no option for HDFS quotas to span
>> multiple directories, we can only use the HDFS quotas for main data files,
>> and not snapshots, etc unless we do major surgery in our file layouts. This
>> will get more complicated if we want to do flat layout, etc later on.
>>   - Since WALs would not be in any namespace unless we do wal-per-namespace,
>> that means that once a single NS's HDFS quota is reached, it might affect
>> everybody else and potentially cause havoc on the cluster. The problem
>> would be that if a single NS is out of space, we cannot perform flushes at
>> all. This would cause the WALs to be backed up and kept forever and affect
>> all of the other regions from different tables / namespaces causing
>> unavailability for unrelated tables. Wal-per-namespace also has to be
>> implemented and WALs be moved under a shared NS directory to share the data
>> and WAL requiring further layout changes. It also will not be optimal if
>> there is a large number of namespaces.
>>   - Will only work with HDFS, while HBase can use other file systems.
>>
>> Enis
>>
>> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell<ap...@apache.org>
>> wrote:
>>
>>> Another approach to hard limits could be pushing the quota down to the
>> HDFS
>>> level, because HDFS would have a very accurate assessment of quota
>>> utilization at all times, but this would only work with HDFS and impose
>>> limits on how HBase structures storage on the filesystem (e.g. all files
>>> for a namespace must be under a common root). Still, implementation would
>>> be "easy": over hard quota, all allocations would fail, the bulk of the
>>> effort is hardening response to allocation failures.
>>>
>>> On Wed, Nov 2, 2016 at 1:11 PM, Enis S�ztutar<en...@apache.org>  wrote:
>>>
>>>> Thanks Josh for the doc and pursuing this.
>>>>
>>>> I was involved with some of the design choices so consider me a +1 on
>> the
>>>> general approach. One topic which is not covered here is that the other
>>>> design decision that we could have pursued is a more strict control on
>>> the
>>>> quota usage so that we would always guarantee that the namespace /
>> table
>>>> cannot use more than allocated disk space. This hard-limit approach
>> would
>>>> differ from the proposed "soft-limit" approach because the soft limit
>>>> approach can end up overusing the disk space by a small amount (because
>>> it
>>>> takes time to detect the quota limit is reached and enforcing of the
>>>> limit).
>>>>
>>>> The hard-limit approach maybe built by doing a lease kind of mechanism
>>>> where the master gives away disk space leases to region servers from
>> the
>>>> remaining limit, and the regionservers make sure that they cannot
>>> allocate
>>>> more space than the lease dictates. By ensuring that the space is
>>>> pre-allocated via leases, we can always make sure that strict limits
>> are
>>>> applied. Though, this approach would be harder to build and stabilize
>>>> because it will need new mechanisms for distributing and managing this
>>> kind
>>>> of leases as well as tuning the allocations to make sure that
>>> regionservers
>>>> never block flushes or compactions due to lack of lease in time would
>>> prove
>>>> challenging to get it right.
>>>>
>>>> We generally think that the "soft-limit" approach would be a good
>> enough
>>>> approximation and the error bounds on over-allocation would be minimal
>>> and
>>>> negligible in production.  Thus, the proposal is to implement the soft
>>>> approach with good documentation about how much space can be
>>> over-allocated
>>>> in a worst-case scenario.
>>>>
>>>> Enis
>>>>
>>>> On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser<el...@apache.org>  wrote:
>>>>
>>>>> Thanks for the reviews so far, Ted and Stack. The comments were great
>>> and
>>>>> much appreciated.
>>>>>
>>>>> Interpreting consensus from lack of objection, I'm going to move
>> ahead
>>> in
>>>>> earnest starting to work on what was described in the doc. Expect to
>>> see
>>>>> some work break-out happening under HBASE-16961 and patches starting
>> to
>>>>> land.
>>>>>
>>>>> I'm also happy to entertain more discussion if anyone hasn't found
>> the
>>>>> time to read/comment yet.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> - Josh
>>>>>
>>>>>
>>>>> Josh Elser wrote:
>>>>>
>>>>>> Sure thing, Ted.
>>>>>>
>>>>>> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
>>>>>> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
>>>>>>
>>>>>>
>>>>>> Let me open an umbrella issue for now. I can break up the work
>> later.
>>>>>> https://issues.apache.org/jira/browse/HBASE-16961
>>>>>>
>>>>>> Ted Yu wrote:
>>>>>>
>>>>>>> Josh:
>>>>>>> Can you put the doc in google doc so that people can comment on it
>> ?
>>>>>>> Is there a JIRA opened for this work ?
>>>>>>> Please open one if there is none.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org>
>>> wrote:
>>>>>>> Hi folks,
>>>>>>>> I'd like to propose the introduction of FileSystem quotas to
>> HBase.
>>>>>>>> Here's a design doc[1] available which (hopefully) covers all of
>> the
>>>>>>>> salient points of what I think an initial version of such a
>> feature
>>>>>>>> would
>>>>>>>> include.
>>>>>>>>
>>>>>>>> tl;dr We can define quotas on tables and namespaces. Region size
>> is
>>>>>>>> computed by RegionServers and sent to the Master. The Master
>>> inspects
>>>>>>>> the
>>>>>>>> sizes of Regions, rolling up to table and namespace sizes. Defined
>>>>>>>> quotas
>>>>>>>> in the quota table are evaluated given the computed sizes, and,
>> for
>>>>>>>> those
>>>>>>>> tables/namespaces violating the quota, RegionServers are informed
>> to
>>>>>>>> take
>>>>>>>> some action to limit any further filesystem growth by that
>>>>>>>> table/namespace.
>>>>>>>>
>>>>>>>> I'd encourage you to give the document a read -- I tried to cover
>> as
>>>>>>>> much
>>>>>>>> as I could without getting unnecessarily bogged down in
>>> implementation
>>>>>>>> details.
>>>>>>>>
>>>>>>>> Feedback is, of course, welcomed. I'd like to start sketching out
>> a
>>>>>>>> breakdown of the work (all writing and no programming makes Josh a
>>> sad
>>>>>>>> boy). I'm happy to field any/all questions. Thanks in advance.
>>>>>>>>
>>>>>>>> - Josh
>>>>>>>>
>>>>>>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
>>>>>>>> heHBase.pdf
>>>>>>>>
>>>>>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>>     - Andy
>>>
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>>
>

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Ted Yu <yu...@gmail.com>.
Josh:
Please capture the following in design doc.

Thanks

On Wed, Nov 2, 2016 at 3:28 PM, Enis Söztutar <en...@gmail.com> wrote:

> Thanks Andrew,
>
> I forgot to mention that we have considered using the HDFS quota
> enforcement directly as well, but decided against it for a couple of
> reasons.
>  - Our current layout has files in the data directory, as well as archive
> directory and WALs, etc. Since there is no option for HDFS quotas to span
> multiple directories, we can only use the HDFS quotas for main data files,
> and not snapshots, etc unless we do major surgery in our file layouts. This
> will get more complicated if we want to do flat layout, etc later on.
>  - Since WALs would not be in any namespace unless we do wal-per-namespace,
> that means that once a single NS's HDFS quota is reached, it might affect
> everybody else and potentially cause havoc on the cluster. The problem
> would be that if a single NS is out of space, we cannot perform flushes at
> all. This would cause the WALs to be backed up and kept forever and affect
> all of the other regions from different tables / namespaces causing
> unavailability for unrelated tables. Wal-per-namespace also has to be
> implemented and WALs be moved under a shared NS directory to share the data
> and WAL requiring further layout changes. It also will not be optimal if
> there is a large number of namespaces.
>  - Will only work with HDFS, while HBase can use other file systems.
>
> Enis
>
> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > Another approach to hard limits could be pushing the quota down to the
> HDFS
> > level, because HDFS would have a very accurate assessment of quota
> > utilization at all times, but this would only work with HDFS and impose
> > limits on how HBase structures storage on the filesystem (e.g. all files
> > for a namespace must be under a common root). Still, implementation would
> > be "easy": over hard quota, all allocations would fail, the bulk of the
> > effort is hardening response to allocation failures.
> >
> > On Wed, Nov 2, 2016 at 1:11 PM, Enis Söztutar <en...@apache.org> wrote:
> >
> > > Thanks Josh for the doc and pursuing this.
> > >
> > > I was involved with some of the design choices so consider me a +1 on
> the
> > > general approach. One topic which is not covered here is that the other
> > > design decision that we could have pursued is a more strict control on
> > the
> > > quota usage so that we would always guarantee that the namespace /
> table
> > > cannot use more than allocated disk space. This hard-limit approach
> would
> > > differ from the proposed "soft-limit" approach because the soft limit
> > > approach can end up overusing the disk space by a small amount (because
> > it
> > > takes time to detect the quota limit is reached and enforcing of the
> > > limit).
> > >
> > > The hard-limit approach maybe built by doing a lease kind of mechanism
> > > where the master gives away disk space leases to region servers from
> the
> > > remaining limit, and the regionservers make sure that they cannot
> > allocate
> > > more space than the lease dictates. By ensuring that the space is
> > > pre-allocated via leases, we can always make sure that strict limits
> are
> > > applied. Though, this approach would be harder to build and stabilize
> > > because it will need new mechanisms for distributing and managing this
> > kind
> > > of leases as well as tuning the allocations to make sure that
> > regionservers
> > > never block flushes or compactions due to lack of lease in time would
> > prove
> > > challenging to get it right.
> > >
> > > We generally think that the "soft-limit" approach would be a good
> enough
> > > approximation and the error bounds on over-allocation would be minimal
> > and
> > > negligible in production.  Thus, the proposal is to implement the soft
> > > approach with good documentation about how much space can be
> > over-allocated
> > > in a worst-case scenario.
> > >
> > > Enis
> > >
> > > On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser <el...@apache.org> wrote:
> > >
> > > > Thanks for the reviews so far, Ted and Stack. The comments were great
> > and
> > > > much appreciated.
> > > >
> > > > Interpreting consensus from lack of objection, I'm going to move
> ahead
> > in
> > > > earnest starting to work on what was described in the doc. Expect to
> > see
> > > > some work break-out happening under HBASE-16961 and patches starting
> to
> > > > land.
> > > >
> > > > I'm also happy to entertain more discussion if anyone hasn't found
> the
> > > > time to read/comment yet.
> > > >
> > > > Thanks!
> > > >
> > > > - Josh
> > > >
> > > >
> > > > Josh Elser wrote:
> > > >
> > > >> Sure thing, Ted.
> > > >>
> > > >> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
> > > >> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
> > > >>
> > > >>
> > > >> Let me open an umbrella issue for now. I can break up the work
> later.
> > > >>
> > > >> https://issues.apache.org/jira/browse/HBASE-16961
> > > >>
> > > >> Ted Yu wrote:
> > > >>
> > > >>> Josh:
> > > >>> Can you put the doc in google doc so that people can comment on it
> ?
> > > >>>
> > > >>> Is there a JIRA opened for this work ?
> > > >>> Please open one if there is none.
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org>
> > wrote:
> > > >>>
> > > >>> Hi folks,
> > > >>>>
> > > >>>> I'd like to propose the introduction of FileSystem quotas to
> HBase.
> > > >>>>
> > > >>>> Here's a design doc[1] available which (hopefully) covers all of
> the
> > > >>>> salient points of what I think an initial version of such a
> feature
> > > >>>> would
> > > >>>> include.
> > > >>>>
> > > >>>> tl;dr We can define quotas on tables and namespaces. Region size
> is
> > > >>>> computed by RegionServers and sent to the Master. The Master
> > inspects
> > > >>>> the
> > > >>>> sizes of Regions, rolling up to table and namespace sizes. Defined
> > > >>>> quotas
> > > >>>> in the quota table are evaluated given the computed sizes, and,
> for
> > > >>>> those
> > > >>>> tables/namespaces violating the quota, RegionServers are informed
> to
> > > >>>> take
> > > >>>> some action to limit any further filesystem growth by that
> > > >>>> table/namespace.
> > > >>>>
> > > >>>> I'd encourage you to give the document a read -- I tried to cover
> as
> > > >>>> much
> > > >>>> as I could without getting unnecessarily bogged down in
> > implementation
> > > >>>> details.
> > > >>>>
> > > >>>> Feedback is, of course, welcomed. I'd like to start sketching out
> a
> > > >>>> breakdown of the work (all writing and no programming makes Josh a
> > sad
> > > >>>> boy). I'm happy to field any/all questions. Thanks in advance.
> > > >>>>
> > > >>>> - Josh
> > > >>>>
> > > >>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
> > > >>>> heHBase.pdf
> > > >>>>
> > > >>>>
> > > >>>
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Enis Söztutar <en...@gmail.com>.
Thanks Andrew,

I forgot to mention that we have considered using the HDFS quota
enforcement directly as well, but decided against it for a couple of
reasons.
 - Our current layout has files in the data directory, as well as archive
directory and WALs, etc. Since there is no option for HDFS quotas to span
multiple directories, we can only use the HDFS quotas for main data files,
and not snapshots, etc unless we do major surgery in our file layouts. This
will get more complicated if we want to do flat layout, etc later on.
 - Since WALs would not be in any namespace unless we do wal-per-namespace,
that means that once a single NS's HDFS quota is reached, it might affect
everybody else and potentially cause havoc on the cluster. The problem
would be that if a single NS is out of space, we cannot perform flushes at
all. This would cause the WALs to be backed up and kept forever and affect
all of the other regions from different tables / namespaces causing
unavailability for unrelated tables. Wal-per-namespace also has to be
implemented and WALs be moved under a shared NS directory to share the data
and WAL requiring further layout changes. It also will not be optimal if
there is a large number of namespaces.
 - Will only work with HDFS, while HBase can use other file systems.

Enis

On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell <ap...@apache.org> wrote:

> Another approach to hard limits could be pushing the quota down to the HDFS
> level, because HDFS would have a very accurate assessment of quota
> utilization at all times, but this would only work with HDFS and impose
> limits on how HBase structures storage on the filesystem (e.g. all files
> for a namespace must be under a common root). Still, implementation would
> be "easy": over hard quota, all allocations would fail, the bulk of the
> effort is hardening response to allocation failures.
>
> On Wed, Nov 2, 2016 at 1:11 PM, Enis Söztutar <en...@apache.org> wrote:
>
> > Thanks Josh for the doc and pursuing this.
> >
> > I was involved with some of the design choices so consider me a +1 on the
> > general approach. One topic which is not covered here is that the other
> > design decision that we could have pursued is a more strict control on
> the
> > quota usage so that we would always guarantee that the namespace / table
> > cannot use more than allocated disk space. This hard-limit approach would
> > differ from the proposed "soft-limit" approach because the soft limit
> > approach can end up overusing the disk space by a small amount (because
> it
> > takes time to detect the quota limit is reached and enforcing of the
> > limit).
> >
> > The hard-limit approach maybe built by doing a lease kind of mechanism
> > where the master gives away disk space leases to region servers from the
> > remaining limit, and the regionservers make sure that they cannot
> allocate
> > more space than the lease dictates. By ensuring that the space is
> > pre-allocated via leases, we can always make sure that strict limits are
> > applied. Though, this approach would be harder to build and stabilize
> > because it will need new mechanisms for distributing and managing this
> kind
> > of leases as well as tuning the allocations to make sure that
> regionservers
> > never block flushes or compactions due to lack of lease in time would
> prove
> > challenging to get it right.
> >
> > We generally think that the "soft-limit" approach would be a good enough
> > approximation and the error bounds on over-allocation would be minimal
> and
> > negligible in production.  Thus, the proposal is to implement the soft
> > approach with good documentation about how much space can be
> over-allocated
> > in a worst-case scenario.
> >
> > Enis
> >
> > On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser <el...@apache.org> wrote:
> >
> > > Thanks for the reviews so far, Ted and Stack. The comments were great
> and
> > > much appreciated.
> > >
> > > Interpreting consensus from lack of objection, I'm going to move ahead
> in
> > > earnest starting to work on what was described in the doc. Expect to
> see
> > > some work break-out happening under HBASE-16961 and patches starting to
> > > land.
> > >
> > > I'm also happy to entertain more discussion if anyone hasn't found the
> > > time to read/comment yet.
> > >
> > > Thanks!
> > >
> > > - Josh
> > >
> > >
> > > Josh Elser wrote:
> > >
> > >> Sure thing, Ted.
> > >>
> > >> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
> > >> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
> > >>
> > >>
> > >> Let me open an umbrella issue for now. I can break up the work later.
> > >>
> > >> https://issues.apache.org/jira/browse/HBASE-16961
> > >>
> > >> Ted Yu wrote:
> > >>
> > >>> Josh:
> > >>> Can you put the doc in google doc so that people can comment on it ?
> > >>>
> > >>> Is there a JIRA opened for this work ?
> > >>> Please open one if there is none.
> > >>>
> > >>> Thanks
> > >>>
> > >>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org>
> wrote:
> > >>>
> > >>> Hi folks,
> > >>>>
> > >>>> I'd like to propose the introduction of FileSystem quotas to HBase.
> > >>>>
> > >>>> Here's a design doc[1] available which (hopefully) covers all of the
> > >>>> salient points of what I think an initial version of such a feature
> > >>>> would
> > >>>> include.
> > >>>>
> > >>>> tl;dr We can define quotas on tables and namespaces. Region size is
> > >>>> computed by RegionServers and sent to the Master. The Master
> inspects
> > >>>> the
> > >>>> sizes of Regions, rolling up to table and namespace sizes. Defined
> > >>>> quotas
> > >>>> in the quota table are evaluated given the computed sizes, and, for
> > >>>> those
> > >>>> tables/namespaces violating the quota, RegionServers are informed to
> > >>>> take
> > >>>> some action to limit any further filesystem growth by that
> > >>>> table/namespace.
> > >>>>
> > >>>> I'd encourage you to give the document a read -- I tried to cover as
> > >>>> much
> > >>>> as I could without getting unnecessarily bogged down in
> implementation
> > >>>> details.
> > >>>>
> > >>>> Feedback is, of course, welcomed. I'd like to start sketching out a
> > >>>> breakdown of the work (all writing and no programming makes Josh a
> sad
> > >>>> boy). I'm happy to field any/all questions. Thanks in advance.
> > >>>>
> > >>>> - Josh
> > >>>>
> > >>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
> > >>>> heHBase.pdf
> > >>>>
> > >>>>
> > >>>
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Andrew Purtell <ap...@apache.org>.
Another approach to hard limits could be pushing the quota down to the HDFS
level, because HDFS would have a very accurate assessment of quota
utilization at all times, but this would only work with HDFS and impose
limits on how HBase structures storage on the filesystem (e.g. all files
for a namespace must be under a common root). Still, implementation would
be "easy": over hard quota, all allocations would fail, the bulk of the
effort is hardening response to allocation failures.

On Wed, Nov 2, 2016 at 1:11 PM, Enis Söztutar <en...@apache.org> wrote:

> Thanks Josh for the doc and pursuing this.
>
> I was involved with some of the design choices so consider me a +1 on the
> general approach. One topic which is not covered here is that the other
> design decision that we could have pursued is a more strict control on the
> quota usage so that we would always guarantee that the namespace / table
> cannot use more than allocated disk space. This hard-limit approach would
> differ from the proposed "soft-limit" approach because the soft limit
> approach can end up overusing the disk space by a small amount (because it
> takes time to detect the quota limit is reached and enforcing of the
> limit).
>
> The hard-limit approach maybe built by doing a lease kind of mechanism
> where the master gives away disk space leases to region servers from the
> remaining limit, and the regionservers make sure that they cannot allocate
> more space than the lease dictates. By ensuring that the space is
> pre-allocated via leases, we can always make sure that strict limits are
> applied. Though, this approach would be harder to build and stabilize
> because it will need new mechanisms for distributing and managing this kind
> of leases as well as tuning the allocations to make sure that regionservers
> never block flushes or compactions due to lack of lease in time would prove
> challenging to get it right.
>
> We generally think that the "soft-limit" approach would be a good enough
> approximation and the error bounds on over-allocation would be minimal and
> negligible in production.  Thus, the proposal is to implement the soft
> approach with good documentation about how much space can be over-allocated
> in a worst-case scenario.
>
> Enis
>
> On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser <el...@apache.org> wrote:
>
> > Thanks for the reviews so far, Ted and Stack. The comments were great and
> > much appreciated.
> >
> > Interpreting consensus from lack of objection, I'm going to move ahead in
> > earnest starting to work on what was described in the doc. Expect to see
> > some work break-out happening under HBASE-16961 and patches starting to
> > land.
> >
> > I'm also happy to entertain more discussion if anyone hasn't found the
> > time to read/comment yet.
> >
> > Thanks!
> >
> > - Josh
> >
> >
> > Josh Elser wrote:
> >
> >> Sure thing, Ted.
> >>
> >> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
> >> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
> >>
> >>
> >> Let me open an umbrella issue for now. I can break up the work later.
> >>
> >> https://issues.apache.org/jira/browse/HBASE-16961
> >>
> >> Ted Yu wrote:
> >>
> >>> Josh:
> >>> Can you put the doc in google doc so that people can comment on it ?
> >>>
> >>> Is there a JIRA opened for this work ?
> >>> Please open one if there is none.
> >>>
> >>> Thanks
> >>>
> >>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org> wrote:
> >>>
> >>> Hi folks,
> >>>>
> >>>> I'd like to propose the introduction of FileSystem quotas to HBase.
> >>>>
> >>>> Here's a design doc[1] available which (hopefully) covers all of the
> >>>> salient points of what I think an initial version of such a feature
> >>>> would
> >>>> include.
> >>>>
> >>>> tl;dr We can define quotas on tables and namespaces. Region size is
> >>>> computed by RegionServers and sent to the Master. The Master inspects
> >>>> the
> >>>> sizes of Regions, rolling up to table and namespace sizes. Defined
> >>>> quotas
> >>>> in the quota table are evaluated given the computed sizes, and, for
> >>>> those
> >>>> tables/namespaces violating the quota, RegionServers are informed to
> >>>> take
> >>>> some action to limit any further filesystem growth by that
> >>>> table/namespace.
> >>>>
> >>>> I'd encourage you to give the document a read -- I tried to cover as
> >>>> much
> >>>> as I could without getting unnecessarily bogged down in implementation
> >>>> details.
> >>>>
> >>>> Feedback is, of course, welcomed. I'd like to start sketching out a
> >>>> breakdown of the work (all writing and no programming makes Josh a sad
> >>>> boy). I'm happy to field any/all questions. Thanks in advance.
> >>>>
> >>>> - Josh
> >>>>
> >>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
> >>>> heHBase.pdf
> >>>>
> >>>>
> >>>
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Enis Söztutar <en...@apache.org>.
Thanks Josh for the doc and pursuing this.

I was involved with some of the design choices so consider me a +1 on the
general approach. One topic which is not covered here is that the other
design decision that we could have pursued is a more strict control on the
quota usage so that we would always guarantee that the namespace / table
cannot use more than allocated disk space. This hard-limit approach would
differ from the proposed "soft-limit" approach because the soft limit
approach can end up overusing the disk space by a small amount (because it
takes time to detect the quota limit is reached and enforcing of the
limit).

The hard-limit approach maybe built by doing a lease kind of mechanism
where the master gives away disk space leases to region servers from the
remaining limit, and the regionservers make sure that they cannot allocate
more space than the lease dictates. By ensuring that the space is
pre-allocated via leases, we can always make sure that strict limits are
applied. Though, this approach would be harder to build and stabilize
because it will need new mechanisms for distributing and managing this kind
of leases as well as tuning the allocations to make sure that regionservers
never block flushes or compactions due to lack of lease in time would prove
challenging to get it right.

We generally think that the "soft-limit" approach would be a good enough
approximation and the error bounds on over-allocation would be minimal and
negligible in production.  Thus, the proposal is to implement the soft
approach with good documentation about how much space can be over-allocated
in a worst-case scenario.

Enis

On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser <el...@apache.org> wrote:

> Thanks for the reviews so far, Ted and Stack. The comments were great and
> much appreciated.
>
> Interpreting consensus from lack of objection, I'm going to move ahead in
> earnest starting to work on what was described in the doc. Expect to see
> some work break-out happening under HBASE-16961 and patches starting to
> land.
>
> I'm also happy to entertain more discussion if anyone hasn't found the
> time to read/comment yet.
>
> Thanks!
>
> - Josh
>
>
> Josh Elser wrote:
>
>> Sure thing, Ted.
>>
>> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
>> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
>>
>>
>> Let me open an umbrella issue for now. I can break up the work later.
>>
>> https://issues.apache.org/jira/browse/HBASE-16961
>>
>> Ted Yu wrote:
>>
>>> Josh:
>>> Can you put the doc in google doc so that people can comment on it ?
>>>
>>> Is there a JIRA opened for this work ?
>>> Please open one if there is none.
>>>
>>> Thanks
>>>
>>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org> wrote:
>>>
>>> Hi folks,
>>>>
>>>> I'd like to propose the introduction of FileSystem quotas to HBase.
>>>>
>>>> Here's a design doc[1] available which (hopefully) covers all of the
>>>> salient points of what I think an initial version of such a feature
>>>> would
>>>> include.
>>>>
>>>> tl;dr We can define quotas on tables and namespaces. Region size is
>>>> computed by RegionServers and sent to the Master. The Master inspects
>>>> the
>>>> sizes of Regions, rolling up to table and namespace sizes. Defined
>>>> quotas
>>>> in the quota table are evaluated given the computed sizes, and, for
>>>> those
>>>> tables/namespaces violating the quota, RegionServers are informed to
>>>> take
>>>> some action to limit any further filesystem growth by that
>>>> table/namespace.
>>>>
>>>> I'd encourage you to give the document a read -- I tried to cover as
>>>> much
>>>> as I could without getting unnecessarily bogged down in implementation
>>>> details.
>>>>
>>>> Feedback is, of course, welcomed. I'd like to start sketching out a
>>>> breakdown of the work (all writing and no programming makes Josh a sad
>>>> boy). I'm happy to field any/all questions. Thanks in advance.
>>>>
>>>> - Josh
>>>>
>>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
>>>> heHBase.pdf
>>>>
>>>>
>>>

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Josh Elser <el...@apache.org>.
Thanks for the reviews so far, Ted and Stack. The comments were great 
and much appreciated.

Interpreting consensus from lack of objection, I'm going to move ahead 
in earnest starting to work on what was described in the doc. Expect to 
see some work break-out happening under HBASE-16961 and patches starting 
to land.

I'm also happy to entertain more discussion if anyone hasn't found the 
time to read/comment yet.

Thanks!

- Josh

Josh Elser wrote:
> Sure thing, Ted.
>
> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZOeecF-YA2FYSK3TSs_bw/edit?usp=sharing
>
>
> Let me open an umbrella issue for now. I can break up the work later.
>
> https://issues.apache.org/jira/browse/HBASE-16961
>
> Ted Yu wrote:
>> Josh:
>> Can you put the doc in google doc so that people can comment on it ?
>>
>> Is there a JIRA opened for this work ?
>> Please open one if there is none.
>>
>> Thanks
>>
>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org> wrote:
>>
>>> Hi folks,
>>>
>>> I'd like to propose the introduction of FileSystem quotas to HBase.
>>>
>>> Here's a design doc[1] available which (hopefully) covers all of the
>>> salient points of what I think an initial version of such a feature
>>> would
>>> include.
>>>
>>> tl;dr We can define quotas on tables and namespaces. Region size is
>>> computed by RegionServers and sent to the Master. The Master inspects
>>> the
>>> sizes of Regions, rolling up to table and namespace sizes. Defined
>>> quotas
>>> in the quota table are evaluated given the computed sizes, and, for
>>> those
>>> tables/namespaces violating the quota, RegionServers are informed to
>>> take
>>> some action to limit any further filesystem growth by that
>>> table/namespace.
>>>
>>> I'd encourage you to give the document a read -- I tried to cover as
>>> much
>>> as I could without getting unnecessarily bogged down in implementation
>>> details.
>>>
>>> Feedback is, of course, welcomed. I'd like to start sketching out a
>>> breakdown of the work (all writing and no programming makes Josh a sad
>>> boy). I'm happy to field any/all questions. Thanks in advance.
>>>
>>> - Josh
>>>
>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
>>> heHBase.pdf
>>>
>>

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Josh Elser <el...@apache.org>.
Sure thing, Ted.

https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZOeecF-YA2FYSK3TSs_bw/edit?usp=sharing

Let me open an umbrella issue for now. I can break up the work later.

https://issues.apache.org/jira/browse/HBASE-16961

Ted Yu wrote:
> Josh:
> Can you put the doc in google doc so that people can comment on it ?
>
> Is there a JIRA opened for this work ?
> Please open one if there is none.
>
> Thanks
>
> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<el...@apache.org>  wrote:
>
>> Hi folks,
>>
>> I'd like to propose the introduction of FileSystem quotas to HBase.
>>
>> Here's a design doc[1] available which (hopefully) covers all of the
>> salient points of what I think an initial version of such a feature would
>> include.
>>
>> tl;dr We can define quotas on tables and namespaces. Region size is
>> computed by RegionServers and sent to the Master. The Master inspects the
>> sizes of Regions, rolling up to table and namespace sizes. Defined quotas
>> in the quota table are evaluated given the computed sizes, and, for those
>> tables/namespaces violating the quota, RegionServers are informed to take
>> some action to limit any further filesystem growth by that table/namespace.
>>
>> I'd encourage you to give the document a read -- I tried to cover as much
>> as I could without getting unnecessarily bogged down in implementation
>> details.
>>
>> Feedback is, of course, welcomed. I'd like to start sketching out a
>> breakdown of the work (all writing and no programming makes Josh a sad
>> boy). I'm happy to field any/all questions. Thanks in advance.
>>
>> - Josh
>>
>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
>> heHBase.pdf
>>
>

Re: [DISCUSS] FileSystem Quotas in HBase

Posted by Ted Yu <yu...@gmail.com>.
Josh:
Can you put the doc in google doc so that people can comment on it ?

Is there a JIRA opened for this work ?
Please open one if there is none.

Thanks

On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser <el...@apache.org> wrote:

> Hi folks,
>
> I'd like to propose the introduction of FileSystem quotas to HBase.
>
> Here's a design doc[1] available which (hopefully) covers all of the
> salient points of what I think an initial version of such a feature would
> include.
>
> tl;dr We can define quotas on tables and namespaces. Region size is
> computed by RegionServers and sent to the Master. The Master inspects the
> sizes of Regions, rolling up to table and namespace sizes. Defined quotas
> in the quota table are evaluated given the computed sizes, and, for those
> tables/namespaces violating the quota, RegionServers are informed to take
> some action to limit any further filesystem growth by that table/namespace.
>
> I'd encourage you to give the document a read -- I tried to cover as much
> as I could without getting unnecessarily bogged down in implementation
> details.
>
> Feedback is, of course, welcomed. I'd like to start sketching out a
> breakdown of the work (all writing and no programming makes Josh a sad
> boy). I'm happy to field any/all questions. Thanks in advance.
>
> - Josh
>
> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
> heHBase.pdf
>