You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2013/01/31 23:46:59 UTC

HBase Checksum

Hi,

I have activated shortcircuit and checksum and I would like to get a
confirmation that it's working fine.

So I have activated short circuit first and saw a 40% improvement of
the MR rowcount job. So I guess it's working fine.

Now, I'm configuring the checksum option, and I'm wondering how I can
do to validate that it's taken into consideration and used, or not. Is
there a way to see that?

Thanks,

JM

Re: HBase Checksum

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
I have done the major compaction just to be sure. From what I
understand, Checksums are not there if this is not activated... So I
think files need to be re-write to have those checkums added.

I will still try to find a way to see that from the logs.

Worst case, I will add some logs directly into the code and re-deploy...

2013/2/1, Robert Dyer <rd...@iastate.edu>:
> Yes that log is a debug level log, as I saw in the source.  But I too
> enabled DEBUG and still never saw that log message.
>
> But I, unlike you, see absolutely no change in performance.
>
> One test I did however that makes me think it is actually enabled: if I
> submit from another user I start getting security warnings about that user
> not having permission for shortcircuit.  So perhaps it is working, but I
> have no clue why that log fails to show anywhere.
>
> Regarding enabling checksums that is an interesting question.  Do I have to
> do a major compaction after enabling so HBase writes the checksum?  Or will
> it detect the setting change and do that automatically?  What if I disable,
> will it remove the checksums?
>
>
> On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari
> <jean-marc@spaggiari.org
>> wrote:
>
>> Hi Robert,
>>
>> That's perfectly fine, it was my next question ;)
>>
>>
>> Anoop, I saw a 5% performance increase by activating HBase Checksum.
>> Can I disable it again to retry the baseline and see the difference?
>> Or now that it's there, it's to late?
>>
>> Also, regarding BlockReaderLocal, I don't find that in my logs, but
>> after I have activated the shortcircuit, I saw a 41% performance
>> increase, so I'm almost sure it's working, but I don't know either how
>>  to check that.
>>
>> What's the best way to see that on the logs? It's not display when
>> HBase is starting. Even not displayed when I'n doing major
>> compactions.
>>
>> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
>> still can't see anything. Not in the region server, and not in the
>> datanode.
>>
>> Also, to check with HDFS level logs whether the checksum meta file is
>> getting read to the DFS client, I'm not really sure how to acheive
>> that.
>>
>> JM
>>
>> 2013/2/1, Robert Dyer <rd...@iastate.edu>:
>> > Ok grepping the RS logs I see nothing with 'local' in any of them.
>>  Thanks
>> > for that hint.
>> >
>> > For the test I was using, I know it is data local.  Every map task
>> launched
>> > data local, and no regions were moving recently.
>> >
>> > I think I've hijacked this thread enough, I'll move my issues to
>> > another.
>> > ;-)
>> >
>> >
>> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <an...@huawei.com>
>> > wrote:
>> >
>> >> Hi Robert
>> >>           When HDFS is doing the local short circuit read, it will use
>> >> BlockReaderLocal class for reading.  There should be some logs at the
>> DFS
>> >> client side (RS) which tells abt creating new BlockReaderLocal .  If
>> >> you
>> >> can see this then sure the local read is happening.
>> >>
>> >> Also check DN log.  If local read happening, then you will not see
>> >> read
>> >> request related logs for the HFile at the DN side.
>> >> You check your no# of HFiles and names for checking the logs
>> >>
>> >> Are you sure that when you tested, u have data locality? Region
>> movements
>> >> across RSs can break the full data locality.
>> >>
>> >> -Anoop-
>> >> ________________________________________
>> >> From: Robert Dyer [psybers@gmail.com]
>> >> Sent: Friday, February 01, 2013 11:10 AM
>> >> To: Hbase-User
>> >> Subject: Re: HBase Checksum
>> >>
>> >> Not trying to hijack your thread here...
>> >>
>> >> But can you verify via logs that the shortcircuit is working?  Because
>> >> I
>> >> enabled shortcircuit but I sure didn't see any performance increase.
>> >>
>> >> I haven't tried enabling hbase checksum yet but I'd like to be able to
>> >> verify that works too.
>> >>
>> >>
>> >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com>
>> >> wrote:
>> >>
>> >> > You can check with HDFS level logs whether the checksum meta file is
>> >> > getting read to the DFS client? In the HBase handled checksum, this
>> >> should
>> >> > not happen.
>> >> > Have you noticed any perf gain when you configure the HBase handled
>> >> > checksum option?
>> >> >
>> >> > -Anoop-
>> >> > ________________________________________
>> >> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
>> >> > Sent: Friday, February 01, 2013 4:16 AM
>> >> > To: user
>> >> > Subject: HBase Checksum
>> >> >
>> >> > Hi,
>> >> >
>> >> > I have activated shortcircuit and checksum and I would like to get a
>> >> > confirmation that it's working fine.
>> >> >
>> >> > So I have activated short circuit first and saw a 40% improvement of
>> >> > the MR rowcount job. So I guess it's working fine.
>> >> >
>> >> > Now, I'm configuring the checksum option, and I'm wondering how I
>> >> > can
>> >> > do to validate that it's taken into consideration and used, or not.
>> >> > Is
>> >> > there a way to see that?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > JM
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Robert Dyer
>> > rdyer@iastate.edu
>> >
>>
>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu
>

Re: HBase Checksum

Posted by lars hofhansl <la...@apache.org>.
Agreed. One should be able to monitor these things.
Mind filing a jira describing your experience?



________________________________
 From: Jean-Marc Spaggiari <je...@spaggiari.org>
To: user@hbase.apache.org; lars hofhansl <la...@apache.org> 
Sent: Friday, February 1, 2013 1:09 PM
Subject: Re: HBase Checksum
 
Thanks for the clarification Lars.

Is there any UI or specify startup log we can check to validate that
it's activated? If not, will it be nice to have something like that?

2013/2/1, lars hofhansl <la...@apache.org>:
> Doing HBase level checksums (as opposed to HDFS level) will mostly yield
> results for random gets.
> Scans (like rowcounting and similar) will probably see a negligible
> improvement.
>
> In HDFS a block and its checksum are stored in different local files on each
> datanode. So loading a block requires 2 IOs.
> With the checksum handled by HBase only one IO is needed per block.
>
>
>
> ________________________________
>  From: Robert Dyer <rd...@iastate.edu>
> To: Hbase-User <us...@hbase.apache.org>
> Sent: Friday, February 1, 2013 11:37 AM
> Subject: Re: HBase Checksum
>
> Yes that log is a debug level log, as I saw in the source.  But I too
> enabled DEBUG and still never saw that log message.
>
> But I, unlike you, see absolutely no change in performance.
>
> One test I did however that makes me think it is actually enabled: if I
> submit from another user I start getting security warnings about that user
> not having permission for shortcircuit.  So perhaps it is working, but I
> have no clue why that log fails to show anywhere.
>
> Regarding enabling checksums that is an interesting question.  Do I have to
> do a major compaction after enabling so HBase writes the checksum?  Or will
> it detect the setting change and do that automatically?  What if I disable,
> will it remove the checksums?
>
>
> On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
>> wrote:
>
>> Hi Robert,
>>
>> That's perfectly fine, it was my next question ;)
>>
>>
>> Anoop, I saw a 5% performance increase by activating HBase Checksum.
>> Can I disable it again to retry the baseline and see the difference?
>> Or now that it's there, it's to late?
>>
>> Also, regarding BlockReaderLocal, I don't find that in my logs, but
>> after I have activated the shortcircuit, I saw a 41% performance
>> increase, so I'm almost sure it's working, but I don't know either how
>>  to check that.
>>
>> What's the best way to see that on the logs? It's not display when
>> HBase is starting. Even not displayed when I'n doing major
>> compactions.
>>
>> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
>> still can't see anything. Not in the region server, and not in the
>> datanode.
>>
>> Also, to check with HDFS level logs whether the checksum meta file is
>> getting read to the DFS client, I'm not really sure how to acheive
>> that.
>>
>> JM
>>
>> 2013/2/1, Robert Dyer <rd...@iastate.edu>:
>> > Ok grepping the RS logs I see nothing with 'local' in any of them.
>>  Thanks
>> > for that hint.
>> >
>> > For the test I was using, I know it is data local.  Every map task
>> launched
>> > data local, and no regions were moving recently.
>> >
>> > I think I've hijacked this thread enough, I'll move my issues to
>> > another.
>> > ;-)
>> >
>> >
>> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <an...@huawei.com>
>> > wrote:
>> >
>> >> Hi Robert
>> >>           When HDFS is doing the local short circuit read, it will use
>> >> BlockReaderLocal class for reading.  There should be some logs at the
>> DFS
>> >> client side (RS) which tells abt creating new BlockReaderLocal .  If
>> >> you
>> >> can see this then sure the local read is happening.
>> >>
>> >> Also check DN log.  If local read happening, then you will not see
>> >> read
>> >> request related logs for the HFile at the DN side.
>> >> You check your no# of HFiles and names for checking the logs
>> >>
>> >> Are you sure that when you tested, u have data locality? Region
>> movements
>> >> across RSs can break the full data locality.
>> >>
>> >> -Anoop-
>> >> ________________________________________
>> >> From: Robert Dyer [psybers@gmail.com]
>> >> Sent: Friday, February 01, 2013 11:10 AM
>> >> To: Hbase-User
>> >> Subject: Re: HBase Checksum
>> >>
>> >> Not trying to hijack your thread here...
>> >>
>> >> But can you verify via logs that the shortcircuit is working?  Because
>> >> I
>> >> enabled shortcircuit but I sure didn't see any performance increase.
>> >>
>> >> I haven't tried enabling hbase checksum yet but I'd like to be able to
>> >> verify that works too.
>> >>
>> >>
>> >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com>
>> >> wrote:
>> >>
>> >> > You can check with HDFS level logs whether the checksum meta file is
>> >> > getting read to the DFS client? In the HBase handled checksum, this
>> >> should
>> >> > not happen.
>> >> > Have you noticed any perf gain when you configure the HBase handled
>> >> > checksum option?
>> >> >
>> >> > -Anoop-
>> >> > ________________________________________
>> >> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
>> >> > Sent: Friday, February 01, 2013 4:16 AM
>> >> > To: user
>> >> > Subject: HBase Checksum
>> >> >
>> >> > Hi,
>> >> >
>> >> > I have activated shortcircuit and checksum and I would like to get a
>> >> > confirmation that it's working fine.
>> >> >
>> >> > So I have activated short circuit first and saw a 40% improvement of
>> >> > the MR rowcount job. So I guess it's working fine.
>> >> >
>> >> > Now, I'm configuring the checksum option, and I'm wondering how I can
>> >> > do to validate that it's taken into consideration and used, or not.
>> >> > Is
>> >> > there a way to see that?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > JM
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Robert Dyer
>> > rdyer@iastate.edu
>> >
>>
>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu

Re: HBase Checksum

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Thanks for the clarification Lars.

Is there any UI or specify startup log we can check to validate that
it's activated? If not, will it be nice to have something like that?

2013/2/1, lars hofhansl <la...@apache.org>:
> Doing HBase level checksums (as opposed to HDFS level) will mostly yield
> results for random gets.
> Scans (like rowcounting and similar) will probably see a negligible
> improvement.
>
> In HDFS a block and its checksum are stored in different local files on each
> datanode. So loading a block requires 2 IOs.
> With the checksum handled by HBase only one IO is needed per block.
>
>
>
> ________________________________
>  From: Robert Dyer <rd...@iastate.edu>
> To: Hbase-User <us...@hbase.apache.org>
> Sent: Friday, February 1, 2013 11:37 AM
> Subject: Re: HBase Checksum
>
> Yes that log is a debug level log, as I saw in the source.  But I too
> enabled DEBUG and still never saw that log message.
>
> But I, unlike you, see absolutely no change in performance.
>
> One test I did however that makes me think it is actually enabled: if I
> submit from another user I start getting security warnings about that user
> not having permission for shortcircuit.  So perhaps it is working, but I
> have no clue why that log fails to show anywhere.
>
> Regarding enabling checksums that is an interesting question.  Do I have to
> do a major compaction after enabling so HBase writes the checksum?  Or will
> it detect the setting change and do that automatically?  What if I disable,
> will it remove the checksums?
>
>
> On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
>> wrote:
>
>> Hi Robert,
>>
>> That's perfectly fine, it was my next question ;)
>>
>>
>> Anoop, I saw a 5% performance increase by activating HBase Checksum.
>> Can I disable it again to retry the baseline and see the difference?
>> Or now that it's there, it's to late?
>>
>> Also, regarding BlockReaderLocal, I don't find that in my logs, but
>> after I have activated the shortcircuit, I saw a 41% performance
>> increase, so I'm almost sure it's working, but I don't know either how
>>  to check that.
>>
>> What's the best way to see that on the logs? It's not display when
>> HBase is starting. Even not displayed when I'n doing major
>> compactions.
>>
>> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
>> still can't see anything. Not in the region server, and not in the
>> datanode.
>>
>> Also, to check with HDFS level logs whether the checksum meta file is
>> getting read to the DFS client, I'm not really sure how to acheive
>> that.
>>
>> JM
>>
>> 2013/2/1, Robert Dyer <rd...@iastate.edu>:
>> > Ok grepping the RS logs I see nothing with 'local' in any of them.
>>  Thanks
>> > for that hint.
>> >
>> > For the test I was using, I know it is data local.  Every map task
>> launched
>> > data local, and no regions were moving recently.
>> >
>> > I think I've hijacked this thread enough, I'll move my issues to
>> > another.
>> > ;-)
>> >
>> >
>> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <an...@huawei.com>
>> > wrote:
>> >
>> >> Hi Robert
>> >>           When HDFS is doing the local short circuit read, it will use
>> >> BlockReaderLocal class for reading.  There should be some logs at the
>> DFS
>> >> client side (RS) which tells abt creating new BlockReaderLocal .  If
>> >> you
>> >> can see this then sure the local read is happening.
>> >>
>> >> Also check DN log.  If local read happening, then you will not see
>> >> read
>> >> request related logs for the HFile at the DN side.
>> >> You check your no# of HFiles and names for checking the logs
>> >>
>> >> Are you sure that when you tested, u have data locality? Region
>> movements
>> >> across RSs can break the full data locality.
>> >>
>> >> -Anoop-
>> >> ________________________________________
>> >> From: Robert Dyer [psybers@gmail.com]
>> >> Sent: Friday, February 01, 2013 11:10 AM
>> >> To: Hbase-User
>> >> Subject: Re: HBase Checksum
>> >>
>> >> Not trying to hijack your thread here...
>> >>
>> >> But can you verify via logs that the shortcircuit is working?  Because
>> >> I
>> >> enabled shortcircuit but I sure didn't see any performance increase.
>> >>
>> >> I haven't tried enabling hbase checksum yet but I'd like to be able to
>> >> verify that works too.
>> >>
>> >>
>> >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com>
>> >> wrote:
>> >>
>> >> > You can check with HDFS level logs whether the checksum meta file is
>> >> > getting read to the DFS client? In the HBase handled checksum, this
>> >> should
>> >> > not happen.
>> >> > Have you noticed any perf gain when you configure the HBase handled
>> >> > checksum option?
>> >> >
>> >> > -Anoop-
>> >> > ________________________________________
>> >> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
>> >> > Sent: Friday, February 01, 2013 4:16 AM
>> >> > To: user
>> >> > Subject: HBase Checksum
>> >> >
>> >> > Hi,
>> >> >
>> >> > I have activated shortcircuit and checksum and I would like to get a
>> >> > confirmation that it's working fine.
>> >> >
>> >> > So I have activated short circuit first and saw a 40% improvement of
>> >> > the MR rowcount job. So I guess it's working fine.
>> >> >
>> >> > Now, I'm configuring the checksum option, and I'm wondering how I can
>> >> > do to validate that it's taken into consideration and used, or not.
>> >> > Is
>> >> > there a way to see that?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > JM
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Robert Dyer
>> > rdyer@iastate.edu
>> >
>>
>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu

Re: HBase Checksum

Posted by lars hofhansl <la...@apache.org>.
Doing HBase level checksums (as opposed to HDFS level) will mostly yield results for random gets.
Scans (like rowcounting and similar) will probably see a negligible improvement.

In HDFS a block and its checksum are stored in different local files on each datanode. So loading a block requires 2 IOs.
With the checksum handled by HBase only one IO is needed per block.



________________________________
 From: Robert Dyer <rd...@iastate.edu>
To: Hbase-User <us...@hbase.apache.org> 
Sent: Friday, February 1, 2013 11:37 AM
Subject: Re: HBase Checksum
 
Yes that log is a debug level log, as I saw in the source.  But I too
enabled DEBUG and still never saw that log message.

But I, unlike you, see absolutely no change in performance.

One test I did however that makes me think it is actually enabled: if I
submit from another user I start getting security warnings about that user
not having permission for shortcircuit.  So perhaps it is working, but I
have no clue why that log fails to show anywhere.

Regarding enabling checksums that is an interesting question.  Do I have to
do a major compaction after enabling so HBase writes the checksum?  Or will
it detect the setting change and do that automatically?  What if I disable,
will it remove the checksums?


On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Robert,
>
> That's perfectly fine, it was my next question ;)
>
>
> Anoop, I saw a 5% performance increase by activating HBase Checksum.
> Can I disable it again to retry the baseline and see the difference?
> Or now that it's there, it's to late?
>
> Also, regarding BlockReaderLocal, I don't find that in my logs, but
> after I have activated the shortcircuit, I saw a 41% performance
> increase, so I'm almost sure it's working, but I don't know either how
>  to check that.
>
> What's the best way to see that on the logs? It's not display when
> HBase is starting. Even not displayed when I'n doing major
> compactions.
>
> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
> still can't see anything. Not in the region server, and not in the
> datanode.
>
> Also, to check with HDFS level logs whether the checksum meta file is
> getting read to the DFS client, I'm not really sure how to acheive
> that.
>
> JM
>
> 2013/2/1, Robert Dyer <rd...@iastate.edu>:
> > Ok grepping the RS logs I see nothing with 'local' in any of them.
>  Thanks
> > for that hint.
> >
> > For the test I was using, I know it is data local.  Every map task
> launched
> > data local, and no regions were moving recently.
> >
> > I think I've hijacked this thread enough, I'll move my issues to another.
> > ;-)
> >
> >
> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <an...@huawei.com>
> > wrote:
> >
> >> Hi Robert
> >>           When HDFS is doing the local short circuit read, it will use
> >> BlockReaderLocal class for reading.  There should be some logs at the
> DFS
> >> client side (RS) which tells abt creating new BlockReaderLocal .  If you
> >> can see this then sure the local read is happening.
> >>
> >> Also check DN log.  If local read happening, then you will not see  read
> >> request related logs for the HFile at the DN side.
> >> You check your no# of HFiles and names for checking the logs
> >>
> >> Are you sure that when you tested, u have data locality? Region
> movements
> >> across RSs can break the full data locality.
> >>
> >> -Anoop-
> >> ________________________________________
> >> From: Robert Dyer [psybers@gmail.com]
> >> Sent: Friday, February 01, 2013 11:10 AM
> >> To: Hbase-User
> >> Subject: Re: HBase Checksum
> >>
> >> Not trying to hijack your thread here...
> >>
> >> But can you verify via logs that the shortcircuit is working?  Because I
> >> enabled shortcircuit but I sure didn't see any performance increase.
> >>
> >> I haven't tried enabling hbase checksum yet but I'd like to be able to
> >> verify that works too.
> >>
> >>
> >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com>
> >> wrote:
> >>
> >> > You can check with HDFS level logs whether the checksum meta file is
> >> > getting read to the DFS client? In the HBase handled checksum, this
> >> should
> >> > not happen.
> >> > Have you noticed any perf gain when you configure the HBase handled
> >> > checksum option?
> >> >
> >> > -Anoop-
> >> > ________________________________________
> >> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> >> > Sent: Friday, February 01, 2013 4:16 AM
> >> > To: user
> >> > Subject: HBase Checksum
> >> >
> >> > Hi,
> >> >
> >> > I have activated shortcircuit and checksum and I would like to get a
> >> > confirmation that it's working fine.
> >> >
> >> > So I have activated short circuit first and saw a 40% improvement of
> >> > the MR rowcount job. So I guess it's working fine.
> >> >
> >> > Now, I'm configuring the checksum option, and I'm wondering how I can
> >> > do to validate that it's taken into consideration and used, or not. Is
> >> > there a way to see that?
> >> >
> >> > Thanks,
> >> >
> >> > JM
> >> >
> >>
> >
> >
> >
> > --
> >
> > Robert Dyer
> > rdyer@iastate.edu
> >
>



-- 

Robert Dyer
rdyer@iastate.edu

Re: HBase Checksum

Posted by Robert Dyer <rd...@iastate.edu>.
Yes that log is a debug level log, as I saw in the source.  But I too
enabled DEBUG and still never saw that log message.

But I, unlike you, see absolutely no change in performance.

One test I did however that makes me think it is actually enabled: if I
submit from another user I start getting security warnings about that user
not having permission for shortcircuit.  So perhaps it is working, but I
have no clue why that log fails to show anywhere.

Regarding enabling checksums that is an interesting question.  Do I have to
do a major compaction after enabling so HBase writes the checksum?  Or will
it detect the setting change and do that automatically?  What if I disable,
will it remove the checksums?


On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Robert,
>
> That's perfectly fine, it was my next question ;)
>
>
> Anoop, I saw a 5% performance increase by activating HBase Checksum.
> Can I disable it again to retry the baseline and see the difference?
> Or now that it's there, it's to late?
>
> Also, regarding BlockReaderLocal, I don't find that in my logs, but
> after I have activated the shortcircuit, I saw a 41% performance
> increase, so I'm almost sure it's working, but I don't know either how
>  to check that.
>
> What's the best way to see that on the logs? It's not display when
> HBase is starting. Even not displayed when I'n doing major
> compactions.
>
> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
> still can't see anything. Not in the region server, and not in the
> datanode.
>
> Also, to check with HDFS level logs whether the checksum meta file is
> getting read to the DFS client, I'm not really sure how to acheive
> that.
>
> JM
>
> 2013/2/1, Robert Dyer <rd...@iastate.edu>:
> > Ok grepping the RS logs I see nothing with 'local' in any of them.
>  Thanks
> > for that hint.
> >
> > For the test I was using, I know it is data local.  Every map task
> launched
> > data local, and no regions were moving recently.
> >
> > I think I've hijacked this thread enough, I'll move my issues to another.
> > ;-)
> >
> >
> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <an...@huawei.com>
> > wrote:
> >
> >> Hi Robert
> >>           When HDFS is doing the local short circuit read, it will use
> >> BlockReaderLocal class for reading.  There should be some logs at the
> DFS
> >> client side (RS) which tells abt creating new BlockReaderLocal .  If you
> >> can see this then sure the local read is happening.
> >>
> >> Also check DN log.  If local read happening, then you will not see  read
> >> request related logs for the HFile at the DN side.
> >> You check your no# of HFiles and names for checking the logs
> >>
> >> Are you sure that when you tested, u have data locality? Region
> movements
> >> across RSs can break the full data locality.
> >>
> >> -Anoop-
> >> ________________________________________
> >> From: Robert Dyer [psybers@gmail.com]
> >> Sent: Friday, February 01, 2013 11:10 AM
> >> To: Hbase-User
> >> Subject: Re: HBase Checksum
> >>
> >> Not trying to hijack your thread here...
> >>
> >> But can you verify via logs that the shortcircuit is working?  Because I
> >> enabled shortcircuit but I sure didn't see any performance increase.
> >>
> >> I haven't tried enabling hbase checksum yet but I'd like to be able to
> >> verify that works too.
> >>
> >>
> >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com>
> >> wrote:
> >>
> >> > You can check with HDFS level logs whether the checksum meta file is
> >> > getting read to the DFS client? In the HBase handled checksum, this
> >> should
> >> > not happen.
> >> > Have you noticed any perf gain when you configure the HBase handled
> >> > checksum option?
> >> >
> >> > -Anoop-
> >> > ________________________________________
> >> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> >> > Sent: Friday, February 01, 2013 4:16 AM
> >> > To: user
> >> > Subject: HBase Checksum
> >> >
> >> > Hi,
> >> >
> >> > I have activated shortcircuit and checksum and I would like to get a
> >> > confirmation that it's working fine.
> >> >
> >> > So I have activated short circuit first and saw a 40% improvement of
> >> > the MR rowcount job. So I guess it's working fine.
> >> >
> >> > Now, I'm configuring the checksum option, and I'm wondering how I can
> >> > do to validate that it's taken into consideration and used, or not. Is
> >> > there a way to see that?
> >> >
> >> > Thanks,
> >> >
> >> > JM
> >> >
> >>
> >
> >
> >
> > --
> >
> > Robert Dyer
> > rdyer@iastate.edu
> >
>



-- 

Robert Dyer
rdyer@iastate.edu

Re: HBase Checksum

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Robert,

That's perfectly fine, it was my next question ;)


Anoop, I saw a 5% performance increase by activating HBase Checksum.
Can I disable it again to retry the baseline and see the difference?
Or now that it's there, it's to late?

Also, regarding BlockReaderLocal, I don't find that in my logs, but
after I have activated the shortcircuit, I saw a 41% performance
increase, so I'm almost sure it's working, but I don't know either how
 to check that.

What's the best way to see that on the logs? It's not display when
HBase is starting. Even not displayed when I'n doing major
compactions.

I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
still can't see anything. Not in the region server, and not in the
datanode.

Also, to check with HDFS level logs whether the checksum meta file is
getting read to the DFS client, I'm not really sure how to acheive
that.

JM

2013/2/1, Robert Dyer <rd...@iastate.edu>:
> Ok grepping the RS logs I see nothing with 'local' in any of them.  Thanks
> for that hint.
>
> For the test I was using, I know it is data local.  Every map task launched
> data local, and no regions were moving recently.
>
> I think I've hijacked this thread enough, I'll move my issues to another.
> ;-)
>
>
> On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <an...@huawei.com>
> wrote:
>
>> Hi Robert
>>           When HDFS is doing the local short circuit read, it will use
>> BlockReaderLocal class for reading.  There should be some logs at the DFS
>> client side (RS) which tells abt creating new BlockReaderLocal .  If you
>> can see this then sure the local read is happening.
>>
>> Also check DN log.  If local read happening, then you will not see  read
>> request related logs for the HFile at the DN side.
>> You check your no# of HFiles and names for checking the logs
>>
>> Are you sure that when you tested, u have data locality? Region movements
>> across RSs can break the full data locality.
>>
>> -Anoop-
>> ________________________________________
>> From: Robert Dyer [psybers@gmail.com]
>> Sent: Friday, February 01, 2013 11:10 AM
>> To: Hbase-User
>> Subject: Re: HBase Checksum
>>
>> Not trying to hijack your thread here...
>>
>> But can you verify via logs that the shortcircuit is working?  Because I
>> enabled shortcircuit but I sure didn't see any performance increase.
>>
>> I haven't tried enabling hbase checksum yet but I'd like to be able to
>> verify that works too.
>>
>>
>> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com>
>> wrote:
>>
>> > You can check with HDFS level logs whether the checksum meta file is
>> > getting read to the DFS client? In the HBase handled checksum, this
>> should
>> > not happen.
>> > Have you noticed any perf gain when you configure the HBase handled
>> > checksum option?
>> >
>> > -Anoop-
>> > ________________________________________
>> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
>> > Sent: Friday, February 01, 2013 4:16 AM
>> > To: user
>> > Subject: HBase Checksum
>> >
>> > Hi,
>> >
>> > I have activated shortcircuit and checksum and I would like to get a
>> > confirmation that it's working fine.
>> >
>> > So I have activated short circuit first and saw a 40% improvement of
>> > the MR rowcount job. So I guess it's working fine.
>> >
>> > Now, I'm configuring the checksum option, and I'm wondering how I can
>> > do to validate that it's taken into consideration and used, or not. Is
>> > there a way to see that?
>> >
>> > Thanks,
>> >
>> > JM
>> >
>>
>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu
>

Re: HBase Checksum

Posted by Robert Dyer <rd...@iastate.edu>.
Ok grepping the RS logs I see nothing with 'local' in any of them.  Thanks
for that hint.

For the test I was using, I know it is data local.  Every map task launched
data local, and no regions were moving recently.

I think I've hijacked this thread enough, I'll move my issues to another.
;-)


On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <an...@huawei.com> wrote:

> Hi Robert
>           When HDFS is doing the local short circuit read, it will use
> BlockReaderLocal class for reading.  There should be some logs at the DFS
> client side (RS) which tells abt creating new BlockReaderLocal .  If you
> can see this then sure the local read is happening.
>
> Also check DN log.  If local read happening, then you will not see  read
> request related logs for the HFile at the DN side.
> You check your no# of HFiles and names for checking the logs
>
> Are you sure that when you tested, u have data locality? Region movements
> across RSs can break the full data locality.
>
> -Anoop-
> ________________________________________
> From: Robert Dyer [psybers@gmail.com]
> Sent: Friday, February 01, 2013 11:10 AM
> To: Hbase-User
> Subject: Re: HBase Checksum
>
> Not trying to hijack your thread here...
>
> But can you verify via logs that the shortcircuit is working?  Because I
> enabled shortcircuit but I sure didn't see any performance increase.
>
> I haven't tried enabling hbase checksum yet but I'd like to be able to
> verify that works too.
>
>
> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com>
> wrote:
>
> > You can check with HDFS level logs whether the checksum meta file is
> > getting read to the DFS client? In the HBase handled checksum, this
> should
> > not happen.
> > Have you noticed any perf gain when you configure the HBase handled
> > checksum option?
> >
> > -Anoop-
> > ________________________________________
> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> > Sent: Friday, February 01, 2013 4:16 AM
> > To: user
> > Subject: HBase Checksum
> >
> > Hi,
> >
> > I have activated shortcircuit and checksum and I would like to get a
> > confirmation that it's working fine.
> >
> > So I have activated short circuit first and saw a 40% improvement of
> > the MR rowcount job. So I guess it's working fine.
> >
> > Now, I'm configuring the checksum option, and I'm wondering how I can
> > do to validate that it's taken into consideration and used, or not. Is
> > there a way to see that?
> >
> > Thanks,
> >
> > JM
> >
>



-- 

Robert Dyer
rdyer@iastate.edu

RE: HBase Checksum

Posted by Anoop Sam John <an...@huawei.com>.
Hi Robert
          When HDFS is doing the local short circuit read, it will use BlockReaderLocal class for reading.  There should be some logs at the DFS client side (RS) which tells abt creating new BlockReaderLocal .  If you can see this then sure the local read is happening.

Also check DN log.  If local read happening, then you will not see  read request related logs for the HFile at the DN side.  
You check your no# of HFiles and names for checking the logs

Are you sure that when you tested, u have data locality? Region movements across RSs can break the full data locality.

-Anoop-
________________________________________
From: Robert Dyer [psybers@gmail.com]
Sent: Friday, February 01, 2013 11:10 AM
To: Hbase-User
Subject: Re: HBase Checksum

Not trying to hijack your thread here...

But can you verify via logs that the shortcircuit is working?  Because I
enabled shortcircuit but I sure didn't see any performance increase.

I haven't tried enabling hbase checksum yet but I'd like to be able to
verify that works too.


On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com> wrote:

> You can check with HDFS level logs whether the checksum meta file is
> getting read to the DFS client? In the HBase handled checksum, this should
> not happen.
> Have you noticed any perf gain when you configure the HBase handled
> checksum option?
>
> -Anoop-
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Friday, February 01, 2013 4:16 AM
> To: user
> Subject: HBase Checksum
>
> Hi,
>
> I have activated shortcircuit and checksum and I would like to get a
> confirmation that it's working fine.
>
> So I have activated short circuit first and saw a 40% improvement of
> the MR rowcount job. So I guess it's working fine.
>
> Now, I'm configuring the checksum option, and I'm wondering how I can
> do to validate that it's taken into consideration and used, or not. Is
> there a way to see that?
>
> Thanks,
>
> JM
>

Re: HBase Checksum

Posted by Robert Dyer <ps...@gmail.com>.
Not trying to hijack your thread here...

But can you verify via logs that the shortcircuit is working?  Because I
enabled shortcircuit but I sure didn't see any performance increase.

I haven't tried enabling hbase checksum yet but I'd like to be able to
verify that works too.


On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <an...@huawei.com> wrote:

> You can check with HDFS level logs whether the checksum meta file is
> getting read to the DFS client? In the HBase handled checksum, this should
> not happen.
> Have you noticed any perf gain when you configure the HBase handled
> checksum option?
>
> -Anoop-
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Friday, February 01, 2013 4:16 AM
> To: user
> Subject: HBase Checksum
>
> Hi,
>
> I have activated shortcircuit and checksum and I would like to get a
> confirmation that it's working fine.
>
> So I have activated short circuit first and saw a 40% improvement of
> the MR rowcount job. So I guess it's working fine.
>
> Now, I'm configuring the checksum option, and I'm wondering how I can
> do to validate that it's taken into consideration and used, or not. Is
> there a way to see that?
>
> Thanks,
>
> JM
>

RE: HBase Checksum

Posted by Anoop Sam John <an...@huawei.com>.
You can check with HDFS level logs whether the checksum meta file is getting read to the DFS client? In the HBase handled checksum, this should not happen.
Have you noticed any perf gain when you configure the HBase handled checksum option?

-Anoop-
________________________________________
From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
Sent: Friday, February 01, 2013 4:16 AM
To: user
Subject: HBase Checksum

Hi,

I have activated shortcircuit and checksum and I would like to get a
confirmation that it's working fine.

So I have activated short circuit first and saw a 40% improvement of
the MR rowcount job. So I guess it's working fine.

Now, I'm configuring the checksum option, and I'm wondering how I can
do to validate that it's taken into consideration and used, or not. Is
there a way to see that?

Thanks,

JM