You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Doug Meil (JIRA)" <ji...@apache.org> on 2011/07/29 13:08:09 UTC

[jira] [Created] (HBASE-4147) StoreFile query usage report

StoreFile query usage report
----------------------------

                 Key: HBASE-4147
                 URL: https://issues.apache.org/jira/browse/HBASE-4147
             Project: HBase
          Issue Type: Improvement
            Reporter: Doug Meil
            Priority: Minor


Detailed information on what HBase is doing in terms of reads is hard to come by.

What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.

This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.

Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).

I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.

Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  

 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082580#comment-13082580 ] 

Doug Meil commented on HBASE-4147:
----------------------------------

For those not familiar with Statspack, see this:  http://www.akadia.com/services/ora_statspack_survival_guide.html

re:  "STATSPACK is a diagnosis tool for instance-wide performance problems; it also supports application tuning activities by providing data which identifies high-load SQL statements. STATSPACK can be used both proactively to monitor the changing load on a system, and also reactively to investigate a performance problem."
overview..

re: "The STATSPACK reports we like are from 1 5-minute intervals during a busy or peak time, when the performance is at its worst."
That's exactly what I'm talking about...  small intervals.  Not too small, but but not too big.

re: "Another common mistake with STATSPACK is to gather snapshots only when there is a problem."
That's why this type of reporting should pretty much be 'always on' - you need to be able to compare to other points in time.

Again, some things don't translate 1:1 from the RDBMS world, but a lot does.  

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4147) StoreFile query usage report

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4147:
-------------------------

         Priority: Critical  (was: Minor)
    Fix Version/s: 0.96.0

Upping priority and marking against 0.96.0 so it gets more consideration.
                
> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Otis Gospodnetic (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238035#comment-13238035 ] 

Otis Gospodnetic commented on HBASE-4147:
-----------------------------------------

Yes, +1 for publishing in JMX.  We need that for our SPM for HBase monitoring service, for example.
                
> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082656#comment-13082656 ] 

Doug Meil commented on HBASE-4147:
----------------------------------

With user example #1 of my updated writeup, an example of that happened on the dist-list recently.  He still had major compactions scheduled daily, and his cluster "got really slow" every day.  If he had summarized 5 min slices differentiating system and user activity, he could tell which table/reg/cf/storefile was getting system (compaction) activity, etc.

I doubt this answers your particular case, but I think this is a fairly common occurrence on with users.

Another example is #3 example of the "degree of uncompaction" stats (e.g., how much time is being spent reading each StoreFile.)  This kind of information would also help the "HBase as queue" issue that was on the dist-list recently.  They weren't doing major compactions and apparently had a lot of StoreFiles.

#2 is the "hybrid activity" use-case.  e.g., MR job going on one table while there are also random reads on another table.  At least seeing the activity during the reporting slice would let you know that something else is happening on the cluster and what is getting accessed.

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4147) StoreFile query usage report

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil updated HBASE-4147:
-----------------------------

    Attachment: hbase_4147_storefilereport.pdf

Adding document for requirements and 1st pass general design.

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081793#comment-13081793 ] 

Nicolas Spiegelberg commented on HBASE-4147:
--------------------------------------------

@Doug: what is your goal for this JIRA?  Collecting stats on StoreFile usage is really good from a core developer perspective, but it sounds like you want better DBA tools.  For example:

1) Get sampling. You just want a way to log every 1k database commands and have some collector that displays high level information on get vs put rate, with basic filtering capabilities.  
2) Note that we're developing a version of "show processlist" for HBase that might also provide the visibility you want (HBASE-4057).  
3) Another option is exporting per-CF metrics in addition to our existing per-server metrics.  We have this sorta hacked up for 89 and could give you the diffs if you want to finish it off for 92.

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082438#comment-13082438 ] 

Doug Meil commented on HBASE-4147:
----------------------------------

Hi Nicolas, I updated another version of this "...storefilereport_2011_08_10.pdf" with a "Why" section that has a little more background on scenarios.

In a word:  Statspack.  Say what one might about Oracle, that report in was the intersection of DBA and developer reporting.  So the answer to the question "developer" vs. "DBA" is "both."

re:  HBASE-4057, is "show processlist" a point-in-time command?  It seems to be.  That's certainly useful, but it's a little different than the view that statspack provides, which does the summary for you.

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104623#comment-13104623 ] 

Otis Gospodnetic commented on HBASE-4147:
-----------------------------------------

+1 for Todd's tracing idea (is this already in a separate JIRA issue that I can't find?)
+1 for what Gary said about using existing mechanisms for publishing metrics, so that those of us who already have tools to gather and aggregate data from there can just keep using those tools instead of developing new things that scrape metrics from additional places.


> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237708#comment-13237708 ] 

Lars Hofhansl commented on HBASE-4147:
--------------------------------------

Should we output the statistics via JMX as well?
                
> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082620#comment-13082620 ] 

Gary Helmling commented on HBASE-4147:
--------------------------------------

I don't want to hijack this issue, but if we're talking about broader tracing and monitoring support, I think another inspiration worth looking at is Dapper:
http://research.google.com/pubs/pub36356.html

Some of the situations we're currently trying to help application teams with are things like:  call #1 took 10msec to process, call #2 took 300000msec to process... Why?  We don't have a whole lot at the moment to help in answering this.  Better load stats help see what's going on in the cluster from one direction.  But it still requires a lot of inferring to see how it ties together from the client end.

@Todd, I like the dtrace-like approach.  I think we could start simply with something like this and spread and evolve it as we go.  Possibly even growing it into distributed tracing.  It seems like that's a broader need so maybe we should move that discussion into another issue.

In general, for additional stats that we export, I would like to gently encourage a hard-look at whether or not there's a way to incorporate them into the metrics framework.  Not everything will fit with this -- it's particularly not so great at dynamically named stats (like stats based on store filenames).  But it does give us an existing framework for collecting, aggregating and reporting those stats, with a variety of tools that integrate nicely.  Just writing stats to log files requires a whole lot more work to actually make the output useful.  Ackowledging sometimes it may be the best/only option though (I'm currently patching up the RPC logging to make it a little more useful).  I really need to look at metricsv2 and see how much more flexibility it gives us.

@Doug, thanks for the STATSPACK link.  I'll read up on that as well.

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4147) StoreFile query usage report

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil updated HBASE-4147:
-----------------------------

    Attachment: hbase_4147_storefilereport_2011_08_10.pdf

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082631#comment-13082631 ] 

stack commented on HBASE-4147:
------------------------------

bq. Some of the situations we're currently trying to help application teams with are things like: call #1 took 10msec to process, call #2 took 300000msec to process... Why?

Me and J-D need to be able to answer that question too.

bq. In general, for additional stats that we export, I would like to gently encourage a hard-look at whether or not there's a way to incorporate them into the metrics framework.

Agree, as much as we can.  We use tsdb here so if a metric, we already have ready means for viewing across time.

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073606#comment-13073606 ] 

Todd Lipcon commented on HBASE-4147:
------------------------------------

Maybe I'm wandering out of scope, but I've been thinking a bit about statistics and monitoring as well recently, and I think it might integrate well with this work.

The idea is to define various probe points (to use the dtrace terminology) throughout the code. Each probe point would have a name and some predefined set of arguments. For example, in the HFile code you might have:
{code}
HFile() {
  this.readTrace = Tracer.get("hfile.read.complete");
}

read() {
...
if (readTrace != null && readTrace.isEnabled()) {
  readTrace.trace(millisSpent, thisHFilePath, blockIdx, ...);
}
{code}

then different things interested in this tracing data can subscribe to the trace point -- in this case in order to collect aggregate statistics for each 30 second period, though other applications would be useful as well. (eg dynamically attach a listener to sample some percentage of requests)

Advantage of the above design is that it's flexible, and if off-by-default should have no performance impact since it will be basically jitted away

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072807#comment-13072807 ] 

Doug Meil commented on HBASE-4147:
----------------------------------

I think that instrumenting StoreFileScanner by gathering time spent for all the 'next' and 'seek' calls would do it.  And then on 'close' it would publish the detailed record to some internal service that would gather up all the these detail records and then periodically dump the summary. 

I'm doing some hand-waving here because we don't want to introduce concurrency issues in the publishing process (e.g., publishing to something that is synchronized will effectively single-thread StoreFileScanners which would be a non-starter), but based on my understanding of the code it seems like this would be a fairly targeted change.  

Thoughts?  

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073627#comment-13073627 ] 

Doug Meil commented on HBASE-4147:
----------------------------------

Thanks Todd.  I don't think it's that is out of scope.  If something like this could serve as the general purpose framework to gather log details, then that sounds like a good idea.  I'd like to see some examples that would satisfy the use cases in my writeup and any other common performance issues that anybody else has.

One expectation I would have is that this would be on for development/production clusters - the next question is what level of summarization is done out of the box in HBase, and (here we go again) where the output goes.  



> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4147) StoreFile query usage report

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082507#comment-13082507 ] 

Doug Meil commented on HBASE-4147:
----------------------------------

Just want to point this out again, "StoreFile" reporting implies that the full path is available (table, region, cf, storefile) not just the StoreFile itself.  That is pointed out in the doc, but I want to point that out again.

With respect to per-CF metrics, that is better than the current situation of only RS level, but I still don't think that covers what I'm proposing there isn't a sense of time-cost with the current metrics, nor does it split out system vs. app usage.  

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.  However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira