You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by David Koch <og...@googlemail.com> on 2013/01/27 23:29:01 UTC

Short-circuit reads

Hello,

I read about "short circuit reads" in the HBase documentation's performance
section[1] and was wondering what people's experiences were using this in a
production setting.

Also,

1. Since only one dedicated user can take advantage of the feature do you
launch all jobs as this user?
2. Can dfs.client.read.shortcircuit be set to false for jobs wich are not
launched by the short-circuit user in order to avoid exceptions? In other
words - can this setting be overriden by the client configuration's
hbase-site.xml?
3. In the same context, it is suggested to enable HBase internal
checksums[2]. Is this a feature which can be enabled in HBase 0.92.1 which
is part of the Cloudera 4.1.x release?

Thank you,

/David

[1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
[2] https://issues.apache.org/jira/browse/HBASE-5074

Re: Short-circuit reads

Posted by Ted Yu <yu...@gmail.com>.
J-D's presentation can give you some idea about the speedup:
http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf

Thanks

On Sun, Jan 27, 2013 at 2:38 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Does this have a significant increase on HBase performances? And what
> are the "risks" associated with short-circuit activation (if any)? No
> risks of corrupting data?
>
> Can this be activated after tables are already populated?
>
> JM
>
> 2013/1/27, Ted <yu...@gmail.com>:
> > For hbase internal checksum, it is not in hbase 0.92.x release.
> >
> > Please use 0.94.2 or newer release.
> >
> > Thanks
> >
> > On Jan 27, 2013, at 2:29 PM, David Koch <og...@googlemail.com> wrote:
> >
> >> Hello,
> >>
> >> I read about "short circuit reads" in the HBase documentation's
> >> performance
> >> section[1] and was wondering what people's experiences were using this
> in
> >> a
> >> production setting.
> >>
> >> Also,
> >>
> >> 1. Since only one dedicated user can take advantage of the feature do
> you
> >> launch all jobs as this user?
> >> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are
> not
> >> launched by the short-circuit user in order to avoid exceptions? In
> other
> >> words - can this setting be overriden by the client configuration's
> >> hbase-site.xml?
> >> 3. In the same context, it is suggested to enable HBase internal
> >> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1
> >> which
> >> is part of the Cloudera 4.1.x release?
> >>
> >> Thank you,
> >>
> >> /David
> >>
> >> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
> >> [2] https://issues.apache.org/jira/browse/HBASE-5074
> >
>

Re: Short-circuit reads

Posted by David Koch <og...@googlemail.com>.
Alright, thanks for the replies - I'll do some testing.

/David

On Sun, Jan 27, 2013 at 11:55 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi J-D,
>
> In your presentation, where are you getting the
> hdfsBlocksLocalityIndex value? I'm not able to find it in my UI...
>
> Thanks,
>
> JM
>
> 2013/1/27, Jean-Daniel Cryans <jd...@apache.org>:
> > On Sun, Jan 27, 2013 at 2:38 PM, Jean-Marc Spaggiari
> > <je...@spaggiari.org> wrote:
> >> Does this have a significant increase on HBase performances?
> >
> > Ted beats me to linking to my own presentation :P
> >
> >> And what
> >> are the "risks" associated with short-circuit activation (if any)? No
> >> risks of corrupting data?
> >
> > No, nothing, apart.
> >
> >>
> >> Can this be activated after tables are already populated?
> >
> > Not related at all, this is a region server configuration.
> >
> >>
> >> JM
> >>
> >> 2013/1/27, Ted <yu...@gmail.com>:
> >>> For hbase internal checksum, it is not in hbase 0.92.x release.
> >>>
> >>> Please use 0.94.2 or newer release.
> >>>
> >>> Thanks
> >>>
> >>> On Jan 27, 2013, at 2:29 PM, David Koch <og...@googlemail.com> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I read about "short circuit reads" in the HBase documentation's
> >>>> performance
> >>>> section[1] and was wondering what people's experiences were using this
> >>>> in
> >>>> a
> >>>> production setting.
> >>>>
> >>>> Also,
> >>>>
> >>>> 1. Since only one dedicated user can take advantage of the feature do
> >>>> you
> >>>> launch all jobs as this user?
> >>>> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are
> >>>> not
> >>>> launched by the short-circuit user in order to avoid exceptions? In
> >>>> other
> >>>> words - can this setting be overriden by the client configuration's
> >>>> hbase-site.xml?
> >>>> 3. In the same context, it is suggested to enable HBase internal
> >>>> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1
> >>>> which
> >>>> is part of the Cloudera 4.1.x release?
> >>>>
> >>>> Thank you,
> >>>>
> >>>> /David
> >>>>
> >>>> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
> >>>> [2] https://issues.apache.org/jira/browse/HBASE-5074
> >>>
> >
>

Re: Short-circuit reads

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Thanks J-D.

I found it with JConsole in hadoop/HBase/RegionServerStatistics/Attributes.

JM

2013/1/27, Jean-Daniel Cryans <jd...@apache.org>:
> It's in the region server metrics and also published through JMX.
>
> J-D
>
> On Sun, Jan 27, 2013 at 2:55 PM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>> Hi J-D,
>>
>> In your presentation, where are you getting the
>> hdfsBlocksLocalityIndex value? I'm not able to find it in my UI...
>>
>> Thanks,
>>
>> JM
>>
>> 2013/1/27, Jean-Daniel Cryans <jd...@apache.org>:
>>> On Sun, Jan 27, 2013 at 2:38 PM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>>> Does this have a significant increase on HBase performances?
>>>
>>> Ted beats me to linking to my own presentation :P
>>>
>>>> And what
>>>> are the "risks" associated with short-circuit activation (if any)? No
>>>> risks of corrupting data?
>>>
>>> No, nothing, apart.
>>>
>>>>
>>>> Can this be activated after tables are already populated?
>>>
>>> Not related at all, this is a region server configuration.
>>>
>>>>
>>>> JM
>>>>
>>>> 2013/1/27, Ted <yu...@gmail.com>:
>>>>> For hbase internal checksum, it is not in hbase 0.92.x release.
>>>>>
>>>>> Please use 0.94.2 or newer release.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Jan 27, 2013, at 2:29 PM, David Koch <og...@googlemail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I read about "short circuit reads" in the HBase documentation's
>>>>>> performance
>>>>>> section[1] and was wondering what people's experiences were using
>>>>>> this
>>>>>> in
>>>>>> a
>>>>>> production setting.
>>>>>>
>>>>>> Also,
>>>>>>
>>>>>> 1. Since only one dedicated user can take advantage of the feature do
>>>>>> you
>>>>>> launch all jobs as this user?
>>>>>> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are
>>>>>> not
>>>>>> launched by the short-circuit user in order to avoid exceptions? In
>>>>>> other
>>>>>> words - can this setting be overriden by the client configuration's
>>>>>> hbase-site.xml?
>>>>>> 3. In the same context, it is suggested to enable HBase internal
>>>>>> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1
>>>>>> which
>>>>>> is part of the Cloudera 4.1.x release?
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> /David
>>>>>>
>>>>>> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
>>>>>> [2] https://issues.apache.org/jira/browse/HBASE-5074
>>>>>
>>>
>

Re: Short-circuit reads

Posted by Jean-Daniel Cryans <jd...@apache.org>.
It's in the region server metrics and also published through JMX.

J-D

On Sun, Jan 27, 2013 at 2:55 PM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi J-D,
>
> In your presentation, where are you getting the
> hdfsBlocksLocalityIndex value? I'm not able to find it in my UI...
>
> Thanks,
>
> JM
>
> 2013/1/27, Jean-Daniel Cryans <jd...@apache.org>:
>> On Sun, Jan 27, 2013 at 2:38 PM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>>> Does this have a significant increase on HBase performances?
>>
>> Ted beats me to linking to my own presentation :P
>>
>>> And what
>>> are the "risks" associated with short-circuit activation (if any)? No
>>> risks of corrupting data?
>>
>> No, nothing, apart.
>>
>>>
>>> Can this be activated after tables are already populated?
>>
>> Not related at all, this is a region server configuration.
>>
>>>
>>> JM
>>>
>>> 2013/1/27, Ted <yu...@gmail.com>:
>>>> For hbase internal checksum, it is not in hbase 0.92.x release.
>>>>
>>>> Please use 0.94.2 or newer release.
>>>>
>>>> Thanks
>>>>
>>>> On Jan 27, 2013, at 2:29 PM, David Koch <og...@googlemail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I read about "short circuit reads" in the HBase documentation's
>>>>> performance
>>>>> section[1] and was wondering what people's experiences were using this
>>>>> in
>>>>> a
>>>>> production setting.
>>>>>
>>>>> Also,
>>>>>
>>>>> 1. Since only one dedicated user can take advantage of the feature do
>>>>> you
>>>>> launch all jobs as this user?
>>>>> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are
>>>>> not
>>>>> launched by the short-circuit user in order to avoid exceptions? In
>>>>> other
>>>>> words - can this setting be overriden by the client configuration's
>>>>> hbase-site.xml?
>>>>> 3. In the same context, it is suggested to enable HBase internal
>>>>> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1
>>>>> which
>>>>> is part of the Cloudera 4.1.x release?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> /David
>>>>>
>>>>> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
>>>>> [2] https://issues.apache.org/jira/browse/HBASE-5074
>>>>
>>

Re: Short-circuit reads

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi J-D,

In your presentation, where are you getting the
hdfsBlocksLocalityIndex value? I'm not able to find it in my UI...

Thanks,

JM

2013/1/27, Jean-Daniel Cryans <jd...@apache.org>:
> On Sun, Jan 27, 2013 at 2:38 PM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>> Does this have a significant increase on HBase performances?
>
> Ted beats me to linking to my own presentation :P
>
>> And what
>> are the "risks" associated with short-circuit activation (if any)? No
>> risks of corrupting data?
>
> No, nothing, apart.
>
>>
>> Can this be activated after tables are already populated?
>
> Not related at all, this is a region server configuration.
>
>>
>> JM
>>
>> 2013/1/27, Ted <yu...@gmail.com>:
>>> For hbase internal checksum, it is not in hbase 0.92.x release.
>>>
>>> Please use 0.94.2 or newer release.
>>>
>>> Thanks
>>>
>>> On Jan 27, 2013, at 2:29 PM, David Koch <og...@googlemail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I read about "short circuit reads" in the HBase documentation's
>>>> performance
>>>> section[1] and was wondering what people's experiences were using this
>>>> in
>>>> a
>>>> production setting.
>>>>
>>>> Also,
>>>>
>>>> 1. Since only one dedicated user can take advantage of the feature do
>>>> you
>>>> launch all jobs as this user?
>>>> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are
>>>> not
>>>> launched by the short-circuit user in order to avoid exceptions? In
>>>> other
>>>> words - can this setting be overriden by the client configuration's
>>>> hbase-site.xml?
>>>> 3. In the same context, it is suggested to enable HBase internal
>>>> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1
>>>> which
>>>> is part of the Cloudera 4.1.x release?
>>>>
>>>> Thank you,
>>>>
>>>> /David
>>>>
>>>> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
>>>> [2] https://issues.apache.org/jira/browse/HBASE-5074
>>>
>

Re: Short-circuit reads

Posted by Jean-Daniel Cryans <jd...@apache.org>.
On Sun, Jan 27, 2013 at 2:38 PM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Does this have a significant increase on HBase performances?

Ted beats me to linking to my own presentation :P

> And what
> are the "risks" associated with short-circuit activation (if any)? No
> risks of corrupting data?

No, nothing, apart.

>
> Can this be activated after tables are already populated?

Not related at all, this is a region server configuration.

>
> JM
>
> 2013/1/27, Ted <yu...@gmail.com>:
>> For hbase internal checksum, it is not in hbase 0.92.x release.
>>
>> Please use 0.94.2 or newer release.
>>
>> Thanks
>>
>> On Jan 27, 2013, at 2:29 PM, David Koch <og...@googlemail.com> wrote:
>>
>>> Hello,
>>>
>>> I read about "short circuit reads" in the HBase documentation's
>>> performance
>>> section[1] and was wondering what people's experiences were using this in
>>> a
>>> production setting.
>>>
>>> Also,
>>>
>>> 1. Since only one dedicated user can take advantage of the feature do you
>>> launch all jobs as this user?
>>> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are not
>>> launched by the short-circuit user in order to avoid exceptions? In other
>>> words - can this setting be overriden by the client configuration's
>>> hbase-site.xml?
>>> 3. In the same context, it is suggested to enable HBase internal
>>> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1
>>> which
>>> is part of the Cloudera 4.1.x release?
>>>
>>> Thank you,
>>>
>>> /David
>>>
>>> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
>>> [2] https://issues.apache.org/jira/browse/HBASE-5074
>>

Re: Short-circuit reads

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Does this have a significant increase on HBase performances? And what
are the "risks" associated with short-circuit activation (if any)? No
risks of corrupting data?

Can this be activated after tables are already populated?

JM

2013/1/27, Ted <yu...@gmail.com>:
> For hbase internal checksum, it is not in hbase 0.92.x release.
>
> Please use 0.94.2 or newer release.
>
> Thanks
>
> On Jan 27, 2013, at 2:29 PM, David Koch <og...@googlemail.com> wrote:
>
>> Hello,
>>
>> I read about "short circuit reads" in the HBase documentation's
>> performance
>> section[1] and was wondering what people's experiences were using this in
>> a
>> production setting.
>>
>> Also,
>>
>> 1. Since only one dedicated user can take advantage of the feature do you
>> launch all jobs as this user?
>> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are not
>> launched by the short-circuit user in order to avoid exceptions? In other
>> words - can this setting be overriden by the client configuration's
>> hbase-site.xml?
>> 3. In the same context, it is suggested to enable HBase internal
>> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1
>> which
>> is part of the Cloudera 4.1.x release?
>>
>> Thank you,
>>
>> /David
>>
>> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
>> [2] https://issues.apache.org/jira/browse/HBASE-5074
>

Re: Short-circuit reads

Posted by Ted <yu...@gmail.com>.
For hbase internal checksum, it is not in hbase 0.92.x release. 

Please use 0.94.2 or newer release. 

Thanks

On Jan 27, 2013, at 2:29 PM, David Koch <og...@googlemail.com> wrote:

> Hello,
> 
> I read about "short circuit reads" in the HBase documentation's performance
> section[1] and was wondering what people's experiences were using this in a
> production setting.
> 
> Also,
> 
> 1. Since only one dedicated user can take advantage of the feature do you
> launch all jobs as this user?
> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are not
> launched by the short-circuit user in order to avoid exceptions? In other
> words - can this setting be overriden by the client configuration's
> hbase-site.xml?
> 3. In the same context, it is suggested to enable HBase internal
> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1 which
> is part of the Cloudera 4.1.x release?
> 
> Thank you,
> 
> /David
> 
> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
> [2] https://issues.apache.org/jira/browse/HBASE-5074

Re: Short-circuit reads

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Inline.

J-D

On Sun, Jan 27, 2013 at 2:29 PM, David Koch <og...@googlemail.com> wrote:
> Hello,
>
> I read about "short circuit reads" in the HBase documentation's performance
> section[1] and was wondering what people's experiences were using this in a
> production setting.
>
> Also,
>
> 1. Since only one dedicated user can take advantage of the feature do you
> launch all jobs as this user?

That's the big limitation right now. Running everything as the same
user can make managing jobs difficult, also that user would need to be
the same as HBase's.

FWIW, HDFS-347 should fix those limitions but it's not committed yet
(getting close tho).

> 2. Can dfs.client.read.shortcircuit be set to false for jobs wich are not
> launched by the short-circuit user in order to avoid exceptions? In other
> words - can this setting be overriden by the client configuration's
> hbase-site.xml?

Yes, but those exceptions are really harmless.

> 3. In the same context, it is suggested to enable HBase internal
> checksums[2]. Is this a feature which can be enabled in HBase 0.92.1 which
> is part of the Cloudera 4.1.x release?

Yes on the first question, no on the second one (what Ted said)

>
> Thank you,
>
> /David
>
> [1] http://hbase.apache.org/book/perf.hdfs.html#ftn.d2145e7370
> [2] https://issues.apache.org/jira/browse/HBASE-5074