You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Jonathan Hsieh <jo...@cloudera.com> on 2013/12/04 03:49:30 UTC

Re: [Shadow Regions / Read Replicas ] External replication disqualified?

The read replicas doc mentions something a little more intrusive in the "3
options" section but doesn't seem to disqualify it.

Relatedly just as another strawman, for the "mostly read only" use case and
"bulk load only" usecases, why not use normal replication against two
clusters in the same HDFS / datacenter and add a "bulk load replication"
feature?

We'd get latency in the seconds (closer to my expected definition of
eventual consistency)

Jon

On Tue, Dec 3, 2013 at 6:47 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

>
>
> On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar <en...@gmail.com> wrote:
>
>> On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <jo...@cloudera.com>
>> wrote:>
>>  >
>> > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <en...@apache.org> wrote:
>> >
>> > > Thanks Jon for bringing this to dev@.
>> > >
>> > >
>> > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <jo...@cloudera.com>
>> > wrote:
>> > >
>> > > > Fundamentally, I'd prefer focusing on making HBase "HBasier"
>> instead of
>> > > > tackling a feature that other systems architecturally can do better
>> > > > (inconsistent reads).   I consider consistent reads/writes being
>> one of
>> > > > HBase's defining features. That said, I think read replicas makes
>> sense
>> > > and
>> > > > is a nice feature to have.
>> > > >
>> > >
>> > > Our design proposal has a specific use case goal, and hopefully we can
>> > > demonstrate the
>> > > benefits of having this in HBase so that even more pieces can be
>> built on
>> > > top of this. Plus I imagine this will
>> > > be a widely used feature for read-only tables or bulk loaded tables.
>> We
>> > are
>> > > not
>> > > proposing of reworking strong consistency semantics or major
>> > architectural
>> > > changes. I think by
>> > > having the tables to be defined with replication count, and the
>> proposed
>> > > client API changes (Consistency definition)
>> > > plugs well into the HBase model rather well.
>> > >
>> > >
>> > I do agree think that without any recent updating mechanism, we are
>> > limiting this usefulness of this feature to essentially *only* the
>> > read-only or bulk load only tables.  Recency if there were any
>> > edits/updates would be severely lagging (by default potentially an hour)
>> > especially in cases where there are only a few edits to a primarily bulk
>> > loaded table.  This limitation is not mentioned in the tradeoffs or
>> > requirements (or a non-requirements section) definitely should be listed
>> > there.
>> >
>>
>> Obviously the amount of lag you would observe depends on whether you are
>> using
>> "Region snapshots", "WAL-Tailing" or "Async wal replication". I think
>> there
>> are still
>> use cases where you can live with >1 hour old stale reads, so that "Region
>> snapshots"
>> is not *just* for read-only tables. I'll add these to the tradeoff's
>> section.
>>
>
> Thanks for adding it there -- I really think it is a big headline caveat
> on my expectation of "eventual consistency".  Other systems out there that
> give you eventually consistency on the millisecond level for most cases,
> while this initial implementation would has eventual mean 10's of minutes
> or even handfuls of minutes behind (with the snapshots flush mechanism)!
>
> There are a handful of other things in the phase one part of the
> implementation section that limit the usefulness of the feature to a
> certain kind of constrained hbase user.  I'll start another thread for
> those.
>
>
>>
>> We are proposing to implement "Region snapshots" first and "Async wal
>> replication" second.
>> As argued, I think wal-tailing only makes sense with WALpr so, that work
>> is
>> left until after we have WAL
>> per region.
>>
>>
> This is our main disagreement -- I'm not convinced that wal tailing only
> making sense for the wal per region hlog implementation.  Instead of
> bouncing around hypotheticals, it sounds like I'll be doing more
> experiments to prove it to myself and to convince you. :)
>
>
>>
>> >
>> > With the current design it might be best to have a flag on the table
>> which
>> > marks it read-only or bulk-load only so that it only gets used by users
>> > when the table is in that mode?  (and maybe an "escape hatch" for power
>> > users).
>> >
>>
>> I think we have a read-only flag already. We might not have bulk-load only
>> flag though. Makes sense to add it
>> if we want to restrict allowing bulk loads but preventing writes.
>>
>> Great.
>
>>
>> >
>> > [snip]
>> > >
>> > > - I think the two goals are both worthy on their own each with their
>> own
>> > > > optimal points.  We should in the design makes sure that we can
>> support
>> > > > both goals.
>> > > >
>> > >
>> > > I think our proposal is consistent with your doc, and we have
>> considered
>> > > secondary region promotion
>> > > in the future section. It would be good if you can review and comment
>> on
>> > > whether you see any points
>> > > missing.
>> > >
>> > >
>> > > I definitely will. At the moment, I think the hybrid for the
>> wals/hlogs I
>> > suggested in the other thread seems to be an optimal solution
>> considering
>> > locality.  Though feasible is obviously more complex than just one
>> approach
>> > alone.
>> >
>> >
>> > > > - I want to making sure the proposed design have a path for optimal
>> > > > fast-consistent read-recovery.
>> > > >
>> > >
>> > > We think that it is, but it is a secondary goal for the initial work.
>> I
>> > > don't see any reason why secondary
>> > > promotion cannot be build on top of this, once the branch is in a
>> better
>> > > state.
>> > >
>> >
>> > Based on the detail in the design doc and this statement it sounds like
>> you
>> > have a prototype branch already?  Is this the case?
>> >
>>
>> Indeed. I think that is mentioned in the jira description. We have some
>> parts of the
>> changes for region, region server, HRI, and master. Client changes are on
>> the way.
>> I think we can post that in a github branch for now to share the code
>> early
>> and solicit
>> early reviews.
>>
>> I think that would be great.  Back when we did snapshots, we had active
> development against a prototype and spent a bit of time breaking it down
> into manageable more polished pieces that had slightly lenient reviews.
>  This exercise really helped us with our interfaces.  We committed code to
> the dev branch which limited merge pains and diff for modifications made by
> different contributors.  In the end when we had something we were happy
> with on the dev branch we merged with trunk and fixed bugs/diffs that
> cropped up in the mean time.  I'd suggest a similar process for this.
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: [Shadow Regions / Read Replicas ] External replication disqualified?

Posted by Jonathan Hsieh <jo...@cloudera.com>.

A little delayed but more questions.


On Tue, Dec 3, 2013 at 10:41 PM, Devaraj Das <dd...@hortonworks.com> wrote:

> On Tue, Dec 3, 2013 at 6:49 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
> > The read replicas doc mentions something a little more intrusive in the
> "3
> > options" section but doesn't seem to disqualify it.
> >
> >
> I don't quite see what you are referring to actually... Can you please
> copy-paste a relevant line from the design doc.
>
>
Here.

>>>> [copy paste]
Async WAL replication
This design will build on the region snapshots and current replication /
log replay features. This
does not necessitate a walperregion approach as with wal tailing. In this
design, the primary will
have a replication source for tailing it’s own log and a replication sink
for each secondary replica
that is alive. The edits will be replicated to the secondaries from the
async replication thread. In
that regard, this design resembles incluster replication, but also sharing
the same data files for
the region, instead of duplicating the data.Similar to wal tailing,
secondaries will have associated memstores, and will be replaying flushes
and compactions and bulk loads.
<<<<<

This option is described but It no negative tradeoffs mentioned.



>  > Relatedly just as another strawman, for the "mostly read only" use case
> and
> > "bulk load only" usecases, why not use normal replication against two
> > clusters in the same HDFS / datacenter and add a "bulk load replication"
> > feature?
> >
> >
> We considered this and the issue is that the resource usage on the HDFS
> would be doubled (for the store files) for the two replica case.
>
> While the async wal replication method is describes the tradeoff, this
does not seem to be an absolutely terrible idea.  This can be done by folks
today and has nice perf isolation features.


>
> > We'd get latency in the seconds (closer to my expected definition of
> > eventual consistency)
> >
> > Jon
> >
> > On Tue, Dec 3, 2013 at 6:47 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> >
> > >
> > >
> > > On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar <en...@gmail.com>
> > wrote:
> > >
> > >> On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <jo...@cloudera.com>
> > >> wrote:>
> > >>  >
> > >> > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <en...@apache.org>
> > wrote:
> > >> >
> > >> > > Thanks Jon for bringing this to dev@.
> > >> > >
> > >> > >
> > >> > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <jon@cloudera.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Fundamentally, I'd prefer focusing on making HBase "HBasier"
> > >> instead of
> > >> > > > tackling a feature that other systems architecturally can do
> > better
> > >> > > > (inconsistent reads).   I consider consistent reads/writes being
> > >> one of
> > >> > > > HBase's defining features. That said, I think read replicas
> makes
> > >> sense
> > >> > > and
> > >> > > > is a nice feature to have.
> > >> > > >
> > >> > >
> > >> > > Our design proposal has a specific use case goal, and hopefully we
> > can
> > >> > > demonstrate the
> > >> > > benefits of having this in HBase so that even more pieces can be
> > >> built on
> > >> > > top of this. Plus I imagine this will
> > >> > > be a widely used feature for read-only tables or bulk loaded
> tables.
> > >> We
> > >> > are
> > >> > > not
> > >> > > proposing of reworking strong consistency semantics or major
> > >> > architectural
> > >> > > changes. I think by
> > >> > > having the tables to be defined with replication count, and the
> > >> proposed
> > >> > > client API changes (Consistency definition)
> > >> > > plugs well into the HBase model rather well.
> > >> > >
> > >> > >
> > >> > I do agree think that without any recent updating mechanism, we are
> > >> > limiting this usefulness of this feature to essentially *only* the
> > >> > read-only or bulk load only tables.  Recency if there were any
> > >> > edits/updates would be severely lagging (by default potentially an
> > hour)
> > >> > especially in cases where there are only a few edits to a primarily
> > bulk
> > >> > loaded table.  This limitation is not mentioned in the tradeoffs or
> > >> > requirements (or a non-requirements section) definitely should be
> > listed
> > >> > there.
> > >> >
> > >>
> > >> Obviously the amount of lag you would observe depends on whether you
> are
> > >> using
> > >> "Region snapshots", "WAL-Tailing" or "Async wal replication". I think
> > >> there
> > >> are still
> > >> use cases where you can live with >1 hour old stale reads, so that
> > "Region
> > >> snapshots"
> > >> is not *just* for read-only tables. I'll add these to the tradeoff's
> > >> section.
> > >>
> > >
> > > Thanks for adding it there -- I really think it is a big headline
> caveat
> > > on my expectation of "eventual consistency".  Other systems out there
> > that
> > > give you eventually consistency on the millisecond level for most
> cases,
> > > while this initial implementation would has eventual mean 10's of
> minutes
> > > or even handfuls of minutes behind (with the snapshots flush
> mechanism)!
> > >
> > > There are a handful of other things in the phase one part of the
> > > implementation section that limit the usefulness of the feature to a
> > > certain kind of constrained hbase user.  I'll start another thread for
> > > those.
> > >
> > >
> > >>
> > >> We are proposing to implement "Region snapshots" first and "Async wal
> > >> replication" second.
> > >> As argued, I think wal-tailing only makes sense with WALpr so, that
> work
> > >> is
> > >> left until after we have WAL
> > >> per region.
> > >>
> > >>
> > > This is our main disagreement -- I'm not convinced that wal tailing
> only
> > > making sense for the wal per region hlog implementation.  Instead of
> > > bouncing around hypotheticals, it sounds like I'll be doing more
> > > experiments to prove it to myself and to convince you. :)
> > >
> > >
> > >>
> > >> >
> > >> > With the current design it might be best to have a flag on the table
> > >> which
> > >> > marks it read-only or bulk-load only so that it only gets used by
> > users
> > >> > when the table is in that mode?  (and maybe an "escape hatch" for
> > power
> > >> > users).
> > >> >
> > >>
> > >> I think we have a read-only flag already. We might not have bulk-load
> > only
> > >> flag though. Makes sense to add it
> > >> if we want to restrict allowing bulk loads but preventing writes.
> > >>
> > >> Great.
> > >
> > >>
> > >> >
> > >> > [snip]
> > >> > >
> > >> > > - I think the two goals are both worthy on their own each with
> their
> > >> own
> > >> > > > optimal points.  We should in the design makes sure that we can
> > >> support
> > >> > > > both goals.
> > >> > > >
> > >> > >
> > >> > > I think our proposal is consistent with your doc, and we have
> > >> considered
> > >> > > secondary region promotion
> > >> > > in the future section. It would be good if you can review and
> > comment
> > >> on
> > >> > > whether you see any points
> > >> > > missing.
> > >> > >
> > >> > >
> > >> > > I definitely will. At the moment, I think the hybrid for the
> > >> wals/hlogs I
> > >> > suggested in the other thread seems to be an optimal solution
> > >> considering
> > >> > locality.  Though feasible is obviously more complex than just one
> > >> approach
> > >> > alone.
> > >> >
> > >> >
> > >> > > > - I want to making sure the proposed design have a path for
> > optimal
> > >> > > > fast-consistent read-recovery.
> > >> > > >
> > >> > >
> > >> > > We think that it is, but it is a secondary goal for the initial
> > work.
> > >> I
> > >> > > don't see any reason why secondary
> > >> > > promotion cannot be build on top of this, once the branch is in a
> > >> better
> > >> > > state.
> > >> > >
> > >> >
> > >> > Based on the detail in the design doc and this statement it sounds
> > like
> > >> you
> > >> > have a prototype branch already?  Is this the case?
> > >> >
> > >>
> > >> Indeed. I think that is mentioned in the jira description. We have
> some
> > >> parts of the
> > >> changes for region, region server, HRI, and master. Client changes are
> > on
> > >> the way.
> > >> I think we can post that in a github branch for now to share the code
> > >> early
> > >> and solicit
> > >> early reviews.
> > >>
> > >> I think that would be great.  Back when we did snapshots, we had
> active
> > > development against a prototype and spent a bit of time breaking it
> down
> > > into manageable more polished pieces that had slightly lenient reviews.
> > >  This exercise really helped us with our interfaces.  We committed code
> > to
> > > the dev branch which limited merge pains and diff for modifications
> made
> > by
> > > different contributors.  In the end when we had something we were happy
> > > with on the dev branch we merged with trunk and fixed bugs/diffs that
> > > cropped up in the mean time.  I'd suggest a similar process for this.
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> > >
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: [Shadow Regions / Read Replicas ] External replication disqualified?

Posted by Devaraj Das <dd...@hortonworks.com>.

On Tue, Dec 3, 2013 at 6:49 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> The read replicas doc mentions something a little more intrusive in the "3
> options" section but doesn't seem to disqualify it.
>
>
I don't quite see what you are referring to actually... Can you please
copy-paste a relevant line from the design doc.


> Relatedly just as another strawman, for the "mostly read only" use case and
> "bulk load only" usecases, why not use normal replication against two
> clusters in the same HDFS / datacenter and add a "bulk load replication"
> feature?
>
>
We considered this and the issue is that the resource usage on the HDFS
would be doubled (for the store files) for the two replica case.


> We'd get latency in the seconds (closer to my expected definition of
> eventual consistency)
>
> Jon
>
> On Tue, Dec 3, 2013 at 6:47 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
> >
> >
> > On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar <en...@gmail.com>
> wrote:
> >
> >> On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <jo...@cloudera.com>
> >> wrote:>
> >>  >
> >> > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <en...@apache.org>
> wrote:
> >> >
> >> > > Thanks Jon for bringing this to dev@.
> >> > >
> >> > >
> >> > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <jo...@cloudera.com>
> >> > wrote:
> >> > >
> >> > > > Fundamentally, I'd prefer focusing on making HBase "HBasier"
> >> instead of
> >> > > > tackling a feature that other systems architecturally can do
> better
> >> > > > (inconsistent reads).   I consider consistent reads/writes being
> >> one of
> >> > > > HBase's defining features. That said, I think read replicas makes
> >> sense
> >> > > and
> >> > > > is a nice feature to have.
> >> > > >
> >> > >
> >> > > Our design proposal has a specific use case goal, and hopefully we
> can
> >> > > demonstrate the
> >> > > benefits of having this in HBase so that even more pieces can be
> >> built on
> >> > > top of this. Plus I imagine this will
> >> > > be a widely used feature for read-only tables or bulk loaded tables.
> >> We
> >> > are
> >> > > not
> >> > > proposing of reworking strong consistency semantics or major
> >> > architectural
> >> > > changes. I think by
> >> > > having the tables to be defined with replication count, and the
> >> proposed
> >> > > client API changes (Consistency definition)
> >> > > plugs well into the HBase model rather well.
> >> > >
> >> > >
> >> > I do agree think that without any recent updating mechanism, we are
> >> > limiting this usefulness of this feature to essentially *only* the
> >> > read-only or bulk load only tables.  Recency if there were any
> >> > edits/updates would be severely lagging (by default potentially an
> hour)
> >> > especially in cases where there are only a few edits to a primarily
> bulk
> >> > loaded table.  This limitation is not mentioned in the tradeoffs or
> >> > requirements (or a non-requirements section) definitely should be
> listed
> >> > there.
> >> >
> >>
> >> Obviously the amount of lag you would observe depends on whether you are
> >> using
> >> "Region snapshots", "WAL-Tailing" or "Async wal replication". I think
> >> there
> >> are still
> >> use cases where you can live with >1 hour old stale reads, so that
> "Region
> >> snapshots"
> >> is not *just* for read-only tables. I'll add these to the tradeoff's
> >> section.
> >>
> >
> > Thanks for adding it there -- I really think it is a big headline caveat
> > on my expectation of "eventual consistency".  Other systems out there
> that
> > give you eventually consistency on the millisecond level for most cases,
> > while this initial implementation would has eventual mean 10's of minutes
> > or even handfuls of minutes behind (with the snapshots flush mechanism)!
> >
> > There are a handful of other things in the phase one part of the
> > implementation section that limit the usefulness of the feature to a
> > certain kind of constrained hbase user.  I'll start another thread for
> > those.
> >
> >
> >>
> >> We are proposing to implement "Region snapshots" first and "Async wal
> >> replication" second.
> >> As argued, I think wal-tailing only makes sense with WALpr so, that work
> >> is
> >> left until after we have WAL
> >> per region.
> >>
> >>
> > This is our main disagreement -- I'm not convinced that wal tailing only
> > making sense for the wal per region hlog implementation.  Instead of
> > bouncing around hypotheticals, it sounds like I'll be doing more
> > experiments to prove it to myself and to convince you. :)
> >
> >
> >>
> >> >
> >> > With the current design it might be best to have a flag on the table
> >> which
> >> > marks it read-only or bulk-load only so that it only gets used by
> users
> >> > when the table is in that mode?  (and maybe an "escape hatch" for
> power
> >> > users).
> >> >
> >>
> >> I think we have a read-only flag already. We might not have bulk-load
> only
> >> flag though. Makes sense to add it
> >> if we want to restrict allowing bulk loads but preventing writes.
> >>
> >> Great.
> >
> >>
> >> >
> >> > [snip]
> >> > >
> >> > > - I think the two goals are both worthy on their own each with their
> >> own
> >> > > > optimal points.  We should in the design makes sure that we can
> >> support
> >> > > > both goals.
> >> > > >
> >> > >
> >> > > I think our proposal is consistent with your doc, and we have
> >> considered
> >> > > secondary region promotion
> >> > > in the future section. It would be good if you can review and
> comment
> >> on
> >> > > whether you see any points
> >> > > missing.
> >> > >
> >> > >
> >> > > I definitely will. At the moment, I think the hybrid for the
> >> wals/hlogs I
> >> > suggested in the other thread seems to be an optimal solution
> >> considering
> >> > locality.  Though feasible is obviously more complex than just one
> >> approach
> >> > alone.
> >> >
> >> >
> >> > > > - I want to making sure the proposed design have a path for
> optimal
> >> > > > fast-consistent read-recovery.
> >> > > >
> >> > >
> >> > > We think that it is, but it is a secondary goal for the initial
> work.
> >> I
> >> > > don't see any reason why secondary
> >> > > promotion cannot be build on top of this, once the branch is in a
> >> better
> >> > > state.
> >> > >
> >> >
> >> > Based on the detail in the design doc and this statement it sounds
> like
> >> you
> >> > have a prototype branch already?  Is this the case?
> >> >
> >>
> >> Indeed. I think that is mentioned in the jira description. We have some
> >> parts of the
> >> changes for region, region server, HRI, and master. Client changes are
> on
> >> the way.
> >> I think we can post that in a github branch for now to share the code
> >> early
> >> and solicit
> >> early reviews.
> >>
> >> I think that would be great.  Back when we did snapshots, we had active
> > development against a prototype and spent a bit of time breaking it down
> > into manageable more polished pieces that had slightly lenient reviews.
> >  This exercise really helped us with our interfaces.  We committed code
> to
> > the dev branch which limited merge pains and diff for modifications made
> by
> > different contributors.  In the end when we had something we were happy
> > with on the dev branch we merged with trunk and fixed bugs/diffs that
> > cropped up in the mean time.  I'd suggest a similar process for this.
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.