You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Mike Drob <ma...@cloudera.com> on 2018/06/01 16:36:41 UTC

HBase Short-Circuit Read Questions

Hi folks, I was going through our docs looking at SCR set up and had some
confusion. Asking here before filing JIRA issues. After writing this, I'm
realizing the length got a bit out of hand. I don't want to split this into
several threads because I think the information is all related, but may
have to do that if a single one becomes difficult to follow.


The main docs link: http://hbase.apache.org/book.html#shortcircuit.reads

1)

Docs claim: dfs.client.read.shortcircuit.skip.checksum = true so we don’t
double checksum (HBase does its own checksumming to save on i/os. See
hbase.regionserver.checksum.verify for more on this.

Code claims:

https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/CommonFSUtils.java#L784-L788

That if this property is set, then we log a warning?

Unrelated, this is duplicated between CommonFSUtils and FSUtils, will need
a jira to clean that up later.

Also, there's a comment in
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L689-L690
that claims we automatically disable it, which we do in the HFileSystem
constructor by setting the same dfs property in our conf to true.

So I'm confused if we should be setting the property like the docs claim,
not setting it like FSUtils warns, or ignoring it and letting RS auto-set
it.

Also unrelated, there is a check in HFileSystem from HBASE-5885 for what I
think is HADOOP-9307, but we should be able to simplify some of that logic
now.

2)

Docs claim: dfs.client.read.shortcircuit.buffer.size = 131072 Important to
avoid OOME — hbase has a default it uses if unset, see
hbase.dfs.client.read.shortcircuit.buffer.size; its default is 131072.

This is very confusing, we should set the property to some value, because
if it's unset then we will use... the same value? This reads like needless
operator burden.

Looking at the code, the default we really use is 64 * 1024 * 2 = 126976,
which is actually close, but off by enough to give me pause.

The default HDFS value is 1024 * 1024, which suggests that they're
expecting a value in the MB range and we're giving one in the KB range?
See:
https://github.com/apache/hadoop/blob/master/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java#L146-L147

Just now, I'm realizing that the initial comment in the docs might mean to
tune it way down to avoid OOME, my initial reading was that we need to
increase the ceiling from whatever default setting comes in via HDFS. Would
be good to clarify this, and also figure out what units the value is in.

3)

Docs suggest: Ensure data locality. In hbase-site.xml, set
hbase.hstore.min.locality.to.skip.major.compact = 0.7 (Meaning that 0.7 <=
n <= 1)

I can't find anything else about this property in the docs. Digging through
the code, I find an oblique reference to HBASE-11195, but there's no RN
there or docs from there, and reading the issue doesn't help me understand
how this operates either. It looks like there was follow on work done, but
it would be useful to know how we arrived at 0.7 (seems arbitrary) and how
an operator could figure out if that setting is good for them or needs to
slide higher/lower.


Thanks,
Mike

Re: HBase Short-Circuit Read Questions

Posted by Mike Drob <ma...@cloudera.com>.
Filed https://issues.apache.org/jira/browse/HBASE-20674 to track some of
the changes I can make based on your answers, Stack.

On Fri, Jun 1, 2018 at 3:22 PM, Stack <st...@duboce.net> wrote:

> On Fri, Jun 1, 2018 at 11:50 AM, Mike Drob <ma...@cloudera.com> wrote:
>
> > On Fri, Jun 1, 2018 at 12:01 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Fri, Jun 1, 2018 at 9:36 AM, Mike Drob <ma...@cloudera.com> wrote:
> >
>
>
> >  I'm working on untangling this mess, but I just got lost in the weeds of
> > the argument on HBASE-6868.
> >
> > I have to assume that this concern over double checksumming, or missing
> > checksums on remote files, or whatever else is going on in that issue
> only
> > applies to truly ancient versions of Hadoop at this point?
>
>
> I don't think so. Skimming that issue, hbase versions are discussed, not
> Hadoop versions. What you seem to be trying to sort out is hbase
> configs/doc around what we ask of HDFS (and SCR) regards checksumming and
> when.
>
> HBASE-6868 was about our checksumming different dependent on whether WAL or
> HFile; we were inconsistent.
>
> It is always possible to double-checksum. Default shouldn't be doing this
> though (at least such was case last time I looked).
>
>
>
> > Do we think it's
> > safe to say that if SCR are enabled, we always want to enable HBase
> > checksums and skip HDFS checksums? That's what the docs appear to
> > recommend, but the code approaches it in the converse perspective:
> >
>
>
> Probably best to set up a rig and verify. You'll then have confidence
> making doc and code change.
>
> I have not looked at this stuff in years other than a recent attempt at
> underlining importance of enabling SCR; I tried to codify my understanding
> from back then in doc (but only seem to have confuse).
>
> Thanks Michael,
> S
>
>
> If HBase checksumming is enabled, we set dfs.c.r.sc.skip.checksum to true
> > and fs.setVerifyChecksum(false) in HFileSystem. User doesn't even have
> > option to override that. HBase checksumming is on by default, so we don't
> > need to mention any of this in the docs, or we can mention turning on
> hbase
> > xsum and turning off dfs xsum and then clarify that none of this is
> > actionable.
> >
> >
>
>
>
>
>
> >
> > > > 2)
> > > >
> > > > Docs claim: dfs.client.read.shortcircuit.buffer.size = 131072
> > Important
> > > to
> > > > avoid OOME — hbase has a default it uses if unset, see
> > > > hbase.dfs.client.read.shortcircuit.buffer.size; its default is
> 131072.
> > > >
> > > > This is very confusing, we should set the property to some value,
> > because
> > > > if it's unset then we will use... the same value? This reads like
> > > needless
> > > > operator burden.
> > > >
> > > > Looking at the code, the default we really use is 64 * 1024 * 2 =
> > 126976,
> > > > which is actually close, but off by enough to give me pause.
> > > >
> > > > The default HDFS value is 1024 * 1024, which suggests that they're
> > > > expecting a value in the MB range and we're giving one in the KB
> range?
> > > > See:
> > > > https://github.com/apache/hadoop/blob/master/hadoop-
> > > > hdfs-project/hadoop-hdfs-client/src/main/java/org/
> > > > apache/hadoop/hdfs/client/HdfsClientConfigKeys.java#L146-L147
> > > >
> > > > Just now, I'm realizing that the initial comment in the docs might
> mean
> > > to
> > > > tune it way down to avoid OOME, my initial reading was that we need
> to
> > > > increase the ceiling from whatever default setting comes in via HDFS.
> > > Would
> > > > be good to clarify this, and also figure out what units the value is
> > in.
> > > >
> > > >
> > > Agree.
> > >
> > > IIRC, intent was to set it way-down from usual default because hbase
> runs
> > > w/ many more open files than your typical HDFS client does.
> > >
> > > Ok, we can update docs to clarify that this is a value in bytes, the
> > default HDFS value is 1MB, our default value is 128KB and that the total
> > memory used will be the buffer size * number of file handles. What's a
> > reasonable first order approximation for number of files per RS that will
> > be affected by SCR? Hosted Regions * Columns? Doesn't need code change, I
> > think, but the rec for 131072 should be removed.
> >
> > >
> > >
> > >
> > > > 3)
> > > >
> > > > Docs suggest: Ensure data locality. In hbase-site.xml, set
> > > > hbase.hstore.min.locality.to.skip.major.compact = 0.7 (Meaning that
> > 0.7
> > > <=
> > > > n <= 1)
> > > >
> > > > I can't find anything else about this property in the docs. Digging
> > > through
> > > > the code, I find an oblique reference to HBASE-11195, but there's no
> RN
> > > > there or docs from there, and reading the issue doesn't help me
> > > understand
> > > > how this operates either. It looks like there was follow on work
> done,
> > > but
> > > > it would be useful to know how we arrived at 0.7 (seems arbitrary)
> and
> > > how
> > > > an operator could figure out if that setting is good for them or
> needs
> > to
> > > > slide higher/lower.
> > > >
> > > >
> > > I don't know anything of the above.
> > >
> > > Will save this for later then.
> >
> >
> > > Thanks,
> > > S
> > >
> > >
> > >
> > > >
> > > > Thanks,
> > > > Mike
> > > >
> > >
> >
>

Re: HBase Short-Circuit Read Questions

Posted by Stack <st...@duboce.net>.
On Fri, Jun 1, 2018 at 11:50 AM, Mike Drob <ma...@cloudera.com> wrote:

> On Fri, Jun 1, 2018 at 12:01 PM, Stack <st...@duboce.net> wrote:
>
> > On Fri, Jun 1, 2018 at 9:36 AM, Mike Drob <ma...@cloudera.com> wrote:
>


>  I'm working on untangling this mess, but I just got lost in the weeds of
> the argument on HBASE-6868.
>
> I have to assume that this concern over double checksumming, or missing
> checksums on remote files, or whatever else is going on in that issue only
> applies to truly ancient versions of Hadoop at this point?


I don't think so. Skimming that issue, hbase versions are discussed, not
Hadoop versions. What you seem to be trying to sort out is hbase
configs/doc around what we ask of HDFS (and SCR) regards checksumming and
when.

HBASE-6868 was about our checksumming different dependent on whether WAL or
HFile; we were inconsistent.

It is always possible to double-checksum. Default shouldn't be doing this
though (at least such was case last time I looked).



> Do we think it's
> safe to say that if SCR are enabled, we always want to enable HBase
> checksums and skip HDFS checksums? That's what the docs appear to
> recommend, but the code approaches it in the converse perspective:
>


Probably best to set up a rig and verify. You'll then have confidence
making doc and code change.

I have not looked at this stuff in years other than a recent attempt at
underlining importance of enabling SCR; I tried to codify my understanding
from back then in doc (but only seem to have confuse).

Thanks Michael,
S


If HBase checksumming is enabled, we set dfs.c.r.sc.skip.checksum to true
> and fs.setVerifyChecksum(false) in HFileSystem. User doesn't even have
> option to override that. HBase checksumming is on by default, so we don't
> need to mention any of this in the docs, or we can mention turning on hbase
> xsum and turning off dfs xsum and then clarify that none of this is
> actionable.
>
>





>
> > > 2)
> > >
> > > Docs claim: dfs.client.read.shortcircuit.buffer.size = 131072
> Important
> > to
> > > avoid OOME — hbase has a default it uses if unset, see
> > > hbase.dfs.client.read.shortcircuit.buffer.size; its default is 131072.
> > >
> > > This is very confusing, we should set the property to some value,
> because
> > > if it's unset then we will use... the same value? This reads like
> > needless
> > > operator burden.
> > >
> > > Looking at the code, the default we really use is 64 * 1024 * 2 =
> 126976,
> > > which is actually close, but off by enough to give me pause.
> > >
> > > The default HDFS value is 1024 * 1024, which suggests that they're
> > > expecting a value in the MB range and we're giving one in the KB range?
> > > See:
> > > https://github.com/apache/hadoop/blob/master/hadoop-
> > > hdfs-project/hadoop-hdfs-client/src/main/java/org/
> > > apache/hadoop/hdfs/client/HdfsClientConfigKeys.java#L146-L147
> > >
> > > Just now, I'm realizing that the initial comment in the docs might mean
> > to
> > > tune it way down to avoid OOME, my initial reading was that we need to
> > > increase the ceiling from whatever default setting comes in via HDFS.
> > Would
> > > be good to clarify this, and also figure out what units the value is
> in.
> > >
> > >
> > Agree.
> >
> > IIRC, intent was to set it way-down from usual default because hbase runs
> > w/ many more open files than your typical HDFS client does.
> >
> > Ok, we can update docs to clarify that this is a value in bytes, the
> default HDFS value is 1MB, our default value is 128KB and that the total
> memory used will be the buffer size * number of file handles. What's a
> reasonable first order approximation for number of files per RS that will
> be affected by SCR? Hosted Regions * Columns? Doesn't need code change, I
> think, but the rec for 131072 should be removed.
>
> >
> >
> >
> > > 3)
> > >
> > > Docs suggest: Ensure data locality. In hbase-site.xml, set
> > > hbase.hstore.min.locality.to.skip.major.compact = 0.7 (Meaning that
> 0.7
> > <=
> > > n <= 1)
> > >
> > > I can't find anything else about this property in the docs. Digging
> > through
> > > the code, I find an oblique reference to HBASE-11195, but there's no RN
> > > there or docs from there, and reading the issue doesn't help me
> > understand
> > > how this operates either. It looks like there was follow on work done,
> > but
> > > it would be useful to know how we arrived at 0.7 (seems arbitrary) and
> > how
> > > an operator could figure out if that setting is good for them or needs
> to
> > > slide higher/lower.
> > >
> > >
> > I don't know anything of the above.
> >
> > Will save this for later then.
>
>
> > Thanks,
> > S
> >
> >
> >
> > >
> > > Thanks,
> > > Mike
> > >
> >
>

Re: HBase Short-Circuit Read Questions

Posted by Mike Drob <ma...@cloudera.com>.
On Fri, Jun 1, 2018 at 12:01 PM, Stack <st...@duboce.net> wrote:

> On Fri, Jun 1, 2018 at 9:36 AM, Mike Drob <ma...@cloudera.com> wrote:
>
> > Hi folks, I was going through our docs looking at SCR set up and had some
> > confusion. Asking here before filing JIRA issues. After writing this, I'm
> > realizing the length got a bit out of hand. I don't want to split this
> into
> > several threads because I think the information is all related, but may
> > have to do that if a single one becomes difficult to follow.
> >
> >
> Thanks for taking a look. See in below.
>
>
>
> >
> > The main docs link: http://hbase.apache.org/book.html#shortcircuit.reads
> >
> > 1)
> >
> > Docs claim: dfs.client.read.shortcircuit.skip.checksum = true so we
> don’t
> > double checksum (HBase does its own checksumming to save on i/os. See
> > hbase.regionserver.checksum.verify for more on this.
> >
> > Code claims:
> >
> > https://github.com/apache/hbase/blob/master/hbase-
> > common/src/main/java/org/apache/hadoop/hbase/util/
> > CommonFSUtils.java#L784-L788
> >
> > That if this property is set, then we log a warning?
> >
> > Unrelated, this is duplicated between CommonFSUtils and FSUtils, will
> need
> > a jira to clean that up later.
> >
> > Also, there's a comment in
> > https://github.com/apache/hbase/blob/master/hbase-
> > server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.
> > java#L689-L690
> > that claims we automatically disable it, which we do in the HFileSystem
> > constructor by setting the same dfs property in our conf to true.
> >
> > So I'm confused if we should be setting the property like the docs claim,
> > not setting it like FSUtils warns, or ignoring it and letting RS auto-set
> > it.
> >
> >
>
> This is a classic font of confusion with layers of attempts over time at
> simplification, auto-config'ing, code migration, and poor doc (I believe
> I'm author of good bit of this mess here). Would be cool if it got a revamp
> informed by tinkering with configs and an edit by a better writer than I.
>
>
> I'm working on untangling this mess, but I just got lost in the weeds of
the argument on HBASE-6868.

I have to assume that this concern over double checksumming, or missing
checksums on remote files, or whatever else is going on in that issue only
applies to truly ancient versions of Hadoop at this point? Do we think it's
safe to say that if SCR are enabled, we always want to enable HBase
checksums and skip HDFS checksums? That's what the docs appear to
recommend, but the code approaches it in the converse perspective:

If HBase checksumming is enabled, we set dfs.c.r.sc.skip.checksum to true
and fs.setVerifyChecksum(false) in HFileSystem. User doesn't even have
option to override that. HBase checksumming is on by default, so we don't
need to mention any of this in the docs, or we can mention turning on hbase
xsum and turning off dfs xsum and then clarify that none of this is
actionable.


> > 2)
> >
> > Docs claim: dfs.client.read.shortcircuit.buffer.size = 131072 Important
> to
> > avoid OOME — hbase has a default it uses if unset, see
> > hbase.dfs.client.read.shortcircuit.buffer.size; its default is 131072.
> >
> > This is very confusing, we should set the property to some value, because
> > if it's unset then we will use... the same value? This reads like
> needless
> > operator burden.
> >
> > Looking at the code, the default we really use is 64 * 1024 * 2 = 126976,
> > which is actually close, but off by enough to give me pause.
> >
> > The default HDFS value is 1024 * 1024, which suggests that they're
> > expecting a value in the MB range and we're giving one in the KB range?
> > See:
> > https://github.com/apache/hadoop/blob/master/hadoop-
> > hdfs-project/hadoop-hdfs-client/src/main/java/org/
> > apache/hadoop/hdfs/client/HdfsClientConfigKeys.java#L146-L147
> >
> > Just now, I'm realizing that the initial comment in the docs might mean
> to
> > tune it way down to avoid OOME, my initial reading was that we need to
> > increase the ceiling from whatever default setting comes in via HDFS.
> Would
> > be good to clarify this, and also figure out what units the value is in.
> >
> >
> Agree.
>
> IIRC, intent was to set it way-down from usual default because hbase runs
> w/ many more open files than your typical HDFS client does.
>
> Ok, we can update docs to clarify that this is a value in bytes, the
default HDFS value is 1MB, our default value is 128KB and that the total
memory used will be the buffer size * number of file handles. What's a
reasonable first order approximation for number of files per RS that will
be affected by SCR? Hosted Regions * Columns? Doesn't need code change, I
think, but the rec for 131072 should be removed.

>
>
>
> > 3)
> >
> > Docs suggest: Ensure data locality. In hbase-site.xml, set
> > hbase.hstore.min.locality.to.skip.major.compact = 0.7 (Meaning that 0.7
> <=
> > n <= 1)
> >
> > I can't find anything else about this property in the docs. Digging
> through
> > the code, I find an oblique reference to HBASE-11195, but there's no RN
> > there or docs from there, and reading the issue doesn't help me
> understand
> > how this operates either. It looks like there was follow on work done,
> but
> > it would be useful to know how we arrived at 0.7 (seems arbitrary) and
> how
> > an operator could figure out if that setting is good for them or needs to
> > slide higher/lower.
> >
> >
> I don't know anything of the above.
>
> Will save this for later then.


> Thanks,
> S
>
>
>
> >
> > Thanks,
> > Mike
> >
>

Re: HBase Short-Circuit Read Questions

Posted by Stack <st...@duboce.net>.
On Fri, Jun 1, 2018 at 9:36 AM, Mike Drob <ma...@cloudera.com> wrote:

> Hi folks, I was going through our docs looking at SCR set up and had some
> confusion. Asking here before filing JIRA issues. After writing this, I'm
> realizing the length got a bit out of hand. I don't want to split this into
> several threads because I think the information is all related, but may
> have to do that if a single one becomes difficult to follow.
>
>
Thanks for taking a look. See in below.



>
> The main docs link: http://hbase.apache.org/book.html#shortcircuit.reads
>
> 1)
>
> Docs claim: dfs.client.read.shortcircuit.skip.checksum = true so we don’t
> double checksum (HBase does its own checksumming to save on i/os. See
> hbase.regionserver.checksum.verify for more on this.
>
> Code claims:
>
> https://github.com/apache/hbase/blob/master/hbase-
> common/src/main/java/org/apache/hadoop/hbase/util/
> CommonFSUtils.java#L784-L788
>
> That if this property is set, then we log a warning?
>
> Unrelated, this is duplicated between CommonFSUtils and FSUtils, will need
> a jira to clean that up later.
>
> Also, there's a comment in
> https://github.com/apache/hbase/blob/master/hbase-
> server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.
> java#L689-L690
> that claims we automatically disable it, which we do in the HFileSystem
> constructor by setting the same dfs property in our conf to true.
>
> So I'm confused if we should be setting the property like the docs claim,
> not setting it like FSUtils warns, or ignoring it and letting RS auto-set
> it.
>
>

This is a classic font of confusion with layers of attempts over time at
simplification, auto-config'ing, code migration, and poor doc (I believe
I'm author of good bit of this mess here). Would be cool if it got a revamp
informed by tinkering with configs and an edit by a better writer than I.



> 2)
>
> Docs claim: dfs.client.read.shortcircuit.buffer.size = 131072 Important to
> avoid OOME — hbase has a default it uses if unset, see
> hbase.dfs.client.read.shortcircuit.buffer.size; its default is 131072.
>
> This is very confusing, we should set the property to some value, because
> if it's unset then we will use... the same value? This reads like needless
> operator burden.
>
> Looking at the code, the default we really use is 64 * 1024 * 2 = 126976,
> which is actually close, but off by enough to give me pause.
>
> The default HDFS value is 1024 * 1024, which suggests that they're
> expecting a value in the MB range and we're giving one in the KB range?
> See:
> https://github.com/apache/hadoop/blob/master/hadoop-
> hdfs-project/hadoop-hdfs-client/src/main/java/org/
> apache/hadoop/hdfs/client/HdfsClientConfigKeys.java#L146-L147
>
> Just now, I'm realizing that the initial comment in the docs might mean to
> tune it way down to avoid OOME, my initial reading was that we need to
> increase the ceiling from whatever default setting comes in via HDFS. Would
> be good to clarify this, and also figure out what units the value is in.
>
>
Agree.

IIRC, intent was to set it way-down from usual default because hbase runs
w/ many more open files than your typical HDFS client does.




> 3)
>
> Docs suggest: Ensure data locality. In hbase-site.xml, set
> hbase.hstore.min.locality.to.skip.major.compact = 0.7 (Meaning that 0.7 <=
> n <= 1)
>
> I can't find anything else about this property in the docs. Digging through
> the code, I find an oblique reference to HBASE-11195, but there's no RN
> there or docs from there, and reading the issue doesn't help me understand
> how this operates either. It looks like there was follow on work done, but
> it would be useful to know how we arrived at 0.7 (seems arbitrary) and how
> an operator could figure out if that setting is good for them or needs to
> slide higher/lower.
>
>
I don't know anything of the above.

Thanks,
S



>
> Thanks,
> Mike
>