You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Varun Sharma <va...@pinterest.com> on 2013/07/05 23:55:25 UTC

Puzzling behaviour with HBase checksums

Hi,

We are running hbase with hbase.regionserver.checksum.verify set to true.
But we are seeing an equal # of seeks for .meta files on HDFS and data
blocks. This is rather puzzling and I dont know if its broken. The hbase
jar is compiled against 2.0.3-alpha and this behaviour occurs for both
0.94.3 and 0.94.7. Shortcircuit local reads is enabled is working well
since only the region server is accessing the disk.

We run an strace limited to lseek calls and get the following:

28162 lseek(*668*, 0, SEEK_SET)           = 0
28162 lseek(*635*, 57479463, SEEK_SET)    = 57479463
28162 lseek(*2255*, 0, SEEK_SET)          = 0
28162 lseek(*1938*, 29285843, SEEK_SET)   = 29285843

Then we use lsof to find the underlying files and match them against the
corresponding file decriptors...

java    27947 hbase * 668u *  REG             202,32   1048583 36176608
/data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
*blk_5081211948968918615_597521.meta*
*
*
java    27947 hbase  *635u*      REG             202,32 134217728 36176607
/data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
*blk_5081211948968918615*
*
*
java    27947 hbase *2255u*   REG             202,16    802375 32768850
/mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
*blk_2670783290218647110_614641.meta*
*
*
java    27947 hbase *1938u*   REG             202,16 102702747 32768849
/mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
*blk_2670783290218647110*

The pattern in strace is pretty clear - first the .meta is read and then
the block is accessed. I am wondering if there are other places apart from
the checksum where the .meta file for the HDFS block is being accessed or
if the checksum stuff is simply broken ? It seems we are accessing 7 byte
values in these .meta files from more strace output. Is there a way I can
find out if the checksums were actually written out to HFiles in the first
place ?

Thanks
Varun

Re: Puzzling behaviour with HBase checksums

Posted by Varun Sharma <va...@pinterest.com>.
Oh I never set that - what does it do, could that possibly be why this is
causing problems ?

THanks
Varun


On Fri, Jul 5, 2013 at 4:22 PM, Ted Yu <yu...@gmail.com> wrote:

> What value did you set for dfs.client.read.shortcircuit.skip.checksum ?
>
> Cheers
>
> On Fri, Jul 5, 2013 at 2:55 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > Hi,
> >
> > We are running hbase with hbase.regionserver.checksum.verify set to true.
> > But we are seeing an equal # of seeks for .meta files on HDFS and data
> > blocks. This is rather puzzling and I dont know if its broken. The hbase
> > jar is compiled against 2.0.3-alpha and this behaviour occurs for both
> > 0.94.3 and 0.94.7. Shortcircuit local reads is enabled is working well
> > since only the region server is accessing the disk.
> >
> > We run an strace limited to lseek calls and get the following:
> >
> > 28162 lseek(*668*, 0, SEEK_SET)           = 0
> > 28162 lseek(*635*, 57479463, SEEK_SET)    = 57479463
> > 28162 lseek(*2255*, 0, SEEK_SET)          = 0
> > 28162 lseek(*1938*, 29285843, SEEK_SET)   = 29285843
> >
> > Then we use lsof to find the underlying files and match them against the
> > corresponding file decriptors...
> >
> > java    27947 hbase * 668u *  REG             202,32   1048583 36176608
> >
> >
> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
> > *blk_5081211948968918615_597521.meta*
> > *
> > *
> > java    27947 hbase  *635u*      REG             202,32 134217728
> 36176607
> >
> >
> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
> > *blk_5081211948968918615*
> > *
> > *
> > java    27947 hbase *2255u*   REG             202,16    802375 32768850
> >
> >
> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
> > *blk_2670783290218647110_614641.meta*
> > *
> > *
> > java    27947 hbase *1938u*   REG             202,16 102702747 32768849
> >
> >
> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
> > *blk_2670783290218647110*
> >
> > The pattern in strace is pretty clear - first the .meta is read and then
> > the block is accessed. I am wondering if there are other places apart
> from
> > the checksum where the .meta file for the HDFS block is being accessed or
> > if the checksum stuff is simply broken ? It seems we are accessing 7 byte
> > values in these .meta files from more strace output. Is there a way I can
> > find out if the checksums were actually written out to HFiles in the
> first
> > place ?
> >
> > Thanks
> > Varun
> >
>

Re: Puzzling behaviour with HBase checksums

Posted by Varun Sharma <va...@pinterest.com>.
Created https://issues.apache.org/jira/browse/HDFS-4960.

Thanks !


On Sat, Jul 6, 2013 at 3:55 PM, Stack <st...@duboce.net> wrote:

> On Fri, Jul 5, 2013 at 5:21 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > I just set this value in hbase-site.xml but still the 7 byte reads and
> > lseek(s) persist.
> >
> >
> c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum
> check at the client side.  (from hdfs-2246 release notes).
>
> It is red herring though because when you set
> hbase.regionserver.checksum.verify, in hbase, we set the above config. to
> true.  See HFileSystem.
>
> It is news to me we read the .meta file even though we ask for no
> checksumming.  Its a bug I'd say.  Do you have the fb commit that skips
> this seek?  No worries if you do not.  I can find it myself.
>
> Nice digging Varun,
> St.Ack
>

Re: Puzzling behaviour with HBase checksums

Posted by Stack <st...@duboce.net>.
On Fri, Jul 5, 2013 at 5:21 PM, Varun Sharma <va...@pinterest.com> wrote:

> I just set this value in hbase-site.xml but still the 7 byte reads and
> lseek(s) persist.
>
>
c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum
check at the client side.  (from hdfs-2246 release notes).

It is red herring though because when you set
hbase.regionserver.checksum.verify, in hbase, we set the above config. to
true.  See HFileSystem.

It is news to me we read the .meta file even though we ask for no
checksumming.  Its a bug I'd say.  Do you have the fb commit that skips
this seek?  No worries if you do not.  I can find it myself.

Nice digging Varun,
St.Ack

Re: Puzzling behaviour with HBase checksums

Posted by Varun Sharma <va...@pinterest.com>.
Okay - I guess I now know what's going on here. Essentially there is a 7
byte header for each block which is read initially irrespective of whether
this is a checksum/no checksum read. Some version checking is done here.
>From what I can see, the FB branch (which I guess is more optimized for
performance) only reads the header if checksum verification is on. I wonder
if that should be done here too.

However, it probably is also the case that most of this stuff is already
page cached since its just the first 7 bytes in a file which otherwise has
100s of kilobytes of checksums.


On Fri, Jul 5, 2013 at 5:21 PM, Varun Sharma <va...@pinterest.com> wrote:

> I just set this value in hbase-site.xml but still the 7 byte reads and
> lseek(s) persist.
>
>
> On Fri, Jul 5, 2013 at 4:22 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> What value did you set for dfs.client.read.shortcircuit.skip.checksum ?
>>
>> Cheers
>>
>> On Fri, Jul 5, 2013 at 2:55 PM, Varun Sharma <va...@pinterest.com> wrote:
>>
>> > Hi,
>> >
>> > We are running hbase with hbase.regionserver.checksum.verify set to
>> true.
>> > But we are seeing an equal # of seeks for .meta files on HDFS and data
>> > blocks. This is rather puzzling and I dont know if its broken. The hbase
>> > jar is compiled against 2.0.3-alpha and this behaviour occurs for both
>> > 0.94.3 and 0.94.7. Shortcircuit local reads is enabled is working well
>> > since only the region server is accessing the disk.
>> >
>> > We run an strace limited to lseek calls and get the following:
>> >
>> > 28162 lseek(*668*, 0, SEEK_SET)           = 0
>> > 28162 lseek(*635*, 57479463, SEEK_SET)    = 57479463
>> > 28162 lseek(*2255*, 0, SEEK_SET)          = 0
>> > 28162 lseek(*1938*, 29285843, SEEK_SET)   = 29285843
>> >
>> > Then we use lsof to find the underlying files and match them against the
>> > corresponding file decriptors...
>> >
>> > java    27947 hbase * 668u *  REG             202,32   1048583 36176608
>> >
>> >
>> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
>> > *blk_5081211948968918615_597521.meta*
>> > *
>> > *
>> > java    27947 hbase  *635u*      REG             202,32 134217728
>> 36176607
>> >
>> >
>> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
>> > *blk_5081211948968918615*
>> > *
>> > *
>> > java    27947 hbase *2255u*   REG             202,16    802375 32768850
>> >
>> >
>> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
>> > *blk_2670783290218647110_614641.meta*
>> > *
>> > *
>> > java    27947 hbase *1938u*   REG             202,16 102702747 32768849
>> >
>> >
>> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
>> > *blk_2670783290218647110*
>> >
>> > The pattern in strace is pretty clear - first the .meta is read and then
>> > the block is accessed. I am wondering if there are other places apart
>> from
>> > the checksum where the .meta file for the HDFS block is being accessed
>> or
>> > if the checksum stuff is simply broken ? It seems we are accessing 7
>> byte
>> > values in these .meta files from more strace output. Is there a way I
>> can
>> > find out if the checksums were actually written out to HFiles in the
>> first
>> > place ?
>> >
>> > Thanks
>> > Varun
>> >
>>
>
>

Re: Puzzling behaviour with HBase checksums

Posted by Varun Sharma <va...@pinterest.com>.
I just set this value in hbase-site.xml but still the 7 byte reads and
lseek(s) persist.


On Fri, Jul 5, 2013 at 4:22 PM, Ted Yu <yu...@gmail.com> wrote:

> What value did you set for dfs.client.read.shortcircuit.skip.checksum ?
>
> Cheers
>
> On Fri, Jul 5, 2013 at 2:55 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > Hi,
> >
> > We are running hbase with hbase.regionserver.checksum.verify set to true.
> > But we are seeing an equal # of seeks for .meta files on HDFS and data
> > blocks. This is rather puzzling and I dont know if its broken. The hbase
> > jar is compiled against 2.0.3-alpha and this behaviour occurs for both
> > 0.94.3 and 0.94.7. Shortcircuit local reads is enabled is working well
> > since only the region server is accessing the disk.
> >
> > We run an strace limited to lseek calls and get the following:
> >
> > 28162 lseek(*668*, 0, SEEK_SET)           = 0
> > 28162 lseek(*635*, 57479463, SEEK_SET)    = 57479463
> > 28162 lseek(*2255*, 0, SEEK_SET)          = 0
> > 28162 lseek(*1938*, 29285843, SEEK_SET)   = 29285843
> >
> > Then we use lsof to find the underlying files and match them against the
> > corresponding file decriptors...
> >
> > java    27947 hbase * 668u *  REG             202,32   1048583 36176608
> >
> >
> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
> > *blk_5081211948968918615_597521.meta*
> > *
> > *
> > java    27947 hbase  *635u*      REG             202,32 134217728
> 36176607
> >
> >
> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
> > *blk_5081211948968918615*
> > *
> > *
> > java    27947 hbase *2255u*   REG             202,16    802375 32768850
> >
> >
> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
> > *blk_2670783290218647110_614641.meta*
> > *
> > *
> > java    27947 hbase *1938u*   REG             202,16 102702747 32768849
> >
> >
> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
> > *blk_2670783290218647110*
> >
> > The pattern in strace is pretty clear - first the .meta is read and then
> > the block is accessed. I am wondering if there are other places apart
> from
> > the checksum where the .meta file for the HDFS block is being accessed or
> > if the checksum stuff is simply broken ? It seems we are accessing 7 byte
> > values in these .meta files from more strace output. Is there a way I can
> > find out if the checksums were actually written out to HFiles in the
> first
> > place ?
> >
> > Thanks
> > Varun
> >
>

Re: Puzzling behaviour with HBase checksums

Posted by Ted Yu <yu...@gmail.com>.
What value did you set for dfs.client.read.shortcircuit.skip.checksum ?

Cheers

On Fri, Jul 5, 2013 at 2:55 PM, Varun Sharma <va...@pinterest.com> wrote:

> Hi,
>
> We are running hbase with hbase.regionserver.checksum.verify set to true.
> But we are seeing an equal # of seeks for .meta files on HDFS and data
> blocks. This is rather puzzling and I dont know if its broken. The hbase
> jar is compiled against 2.0.3-alpha and this behaviour occurs for both
> 0.94.3 and 0.94.7. Shortcircuit local reads is enabled is working well
> since only the region server is accessing the disk.
>
> We run an strace limited to lseek calls and get the following:
>
> 28162 lseek(*668*, 0, SEEK_SET)           = 0
> 28162 lseek(*635*, 57479463, SEEK_SET)    = 57479463
> 28162 lseek(*2255*, 0, SEEK_SET)          = 0
> 28162 lseek(*1938*, 29285843, SEEK_SET)   = 29285843
>
> Then we use lsof to find the underlying files and match them against the
> corresponding file decriptors...
>
> java    27947 hbase * 668u *  REG             202,32   1048583 36176608
>
> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
> *blk_5081211948968918615_597521.meta*
> *
> *
> java    27947 hbase  *635u*      REG             202,32 134217728 36176607
>
> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
> *blk_5081211948968918615*
> *
> *
> java    27947 hbase *2255u*   REG             202,16    802375 32768850
>
> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
> *blk_2670783290218647110_614641.meta*
> *
> *
> java    27947 hbase *1938u*   REG             202,16 102702747 32768849
>
> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
> *blk_2670783290218647110*
>
> The pattern in strace is pretty clear - first the .meta is read and then
> the block is accessed. I am wondering if there are other places apart from
> the checksum where the .meta file for the HDFS block is being accessed or
> if the checksum stuff is simply broken ? It seems we are accessing 7 byte
> values in these .meta files from more strace output. Is there a way I can
> find out if the checksums were actually written out to HFiles in the first
> place ?
>
> Thanks
> Varun
>