You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2012/05/22 08:31:43 UTC

[jira] [Created] (HBASE-6066) some low hanging read path improvement ideas

Kannan Muthukkaruppan created HBASE-6066:
--------------------------------------------

             Summary: some low hanging read path improvement ideas 
                 Key: HBASE-6066
                 URL: https://issues.apache.org/jira/browse/HBASE-6066
             Project: HBase
          Issue Type: Bug
            Reporter: Kannan Muthukkaruppan


I was runnign some single threaded scan performance tests for a table
with small sized rows that is fully cached. Some observations...

Several wasteful iterations over and/or building of temporary lists.
1) One such is the following:

{code}
   boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
   if (!values.isEmpty()) {
     for (KeyValue kv : values) {              ------> #### wasteful in most cases
       currentScanResultSize += kv.heapSize();
   }
   results.add(new Result(values));
{code}

By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
avoid the unnecessary iteration to compute currentScanResultSize.


2) An example of a wasteful temporary array, is "results" in
RegionScanner.next().

{code}
      results.clear();
      boolean returnResult = nextInternal(limit, metric);

      outResults.addAll(results);
{code}

results then gets copied over to outResults via an addAll().
Not sure why we can directly collect the results in outResults.

3) Another almost similar exmaple of a wasteful array is "results" in
StoreScanner.next(), which eventually also copies its results
into "outResults".


4) Reduce overhead of "size metric" maintained in StoreScanner.next().

{code}
  if (metric != null) {
     HRegion.incrNumericMetric(this.metricNamePrefix + metric,
                               copyKv.getLength());
  }
  results.add(copyKv);
{code}

A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.

5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Jie Huang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436506#comment-13436506 ] 

Jie Huang commented on HBASE-6066:
----------------------------------

Regarding Part #5, We noticed that nested synchronized phenomenon
 as well. But unfortunately, after eliminating that nested *synchronize* method for RegionScanner.next(), it doesn't improve any. Just like the description mentioned, the nested synchronized methods only adds small overhead. 
In my opinion, that overhead is far less than the synchronized method itself. But I have a question here, why we need to synchronize that .next() function here? Is there any particular concern? thanks. 
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-6066:
-----------------------------------------

    Description: 
I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...

We seem to be doing several wasteful iterations over and/or building of temporary lists.

1) One such is the following code in HRegionServer.next():

{code}
   boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
   if (!values.isEmpty()) {
     for (KeyValue kv : values) {              ------> #### wasteful in most cases
       currentScanResultSize += kv.heapSize();
   }
   results.add(new Result(values));
{code}

By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
we can avoid the unnecessary iteration to compute currentScanResultSize.


2) An example of a wasteful temporary array, is "results" in
RegionScanner.next().

{code}
      results.clear();
      boolean returnResult = nextInternal(limit, metric);

      outResults.addAll(results);
{code}

results then gets copied over to outResults via an addAll(). Not sure why we can directly collect the results in outResults.

3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".


4) Reduce overhead of "size metric" maintained in StoreScanner.next().

{code}
  if (metric != null) {
     HRegion.incrNumericMetric(this.metricNamePrefix + metric,
                               copyKv.getLength());
  }
  results.add(copyKv);
{code}

A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.

5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.


  was:
I was runnign some single threaded scan performance tests for a table
with small sized rows that is fully cached. Some observations...

Several wasteful iterations over and/or building of temporary lists.
1) One such is the following:

{code}
   boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
   if (!values.isEmpty()) {
     for (KeyValue kv : values) {              ------> #### wasteful in most cases
       currentScanResultSize += kv.heapSize();
   }
   results.add(new Result(values));
{code}

By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
avoid the unnecessary iteration to compute currentScanResultSize.


2) An example of a wasteful temporary array, is "results" in
RegionScanner.next().

{code}
      results.clear();
      boolean returnResult = nextInternal(limit, metric);

      outResults.addAll(results);
{code}

results then gets copied over to outResults via an addAll().
Not sure why we can directly collect the results in outResults.

3) Another almost similar exmaple of a wasteful array is "results" in
StoreScanner.next(), which eventually also copies its results
into "outResults".


4) Reduce overhead of "size metric" maintained in StoreScanner.next().

{code}
  if (metric != null) {
     HRegion.incrNumericMetric(this.metricNamePrefix + metric,
                               copyKv.getLength());
  }
  results.add(copyKv);
{code}

A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.

5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.


    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-6066:
-----------------------------------------

    Description: 
I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...

We seem to be doing several wasteful iterations over and/or building of temporary lists.

1) One such is the following code in HRegionServer.next():

{code}
   boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
   if (!values.isEmpty()) {
     for (KeyValue kv : values) {              ------> #### wasteful in most cases
       currentScanResultSize += kv.heapSize();
   }
   results.add(new Result(values));
{code}

By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
we can avoid the unnecessary iteration to compute currentScanResultSize.


2) An example of a wasteful temporary array, is "results" in
RegionScanner.next().

{code}
      results.clear();
      boolean returnResult = nextInternal(limit, metric);

      outResults.addAll(results);
{code}

results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.

3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".


4) Reduce overhead of "size metric" maintained in StoreScanner.next().

{code}
  if (metric != null) {
     HRegion.incrNumericMetric(this.metricNamePrefix + metric,
                               copyKv.getLength());
  }
  results.add(copyKv);
{code}

A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.

5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.


  was:
I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...

We seem to be doing several wasteful iterations over and/or building of temporary lists.

1) One such is the following code in HRegionServer.next():

{code}
   boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
   if (!values.isEmpty()) {
     for (KeyValue kv : values) {              ------> #### wasteful in most cases
       currentScanResultSize += kv.heapSize();
   }
   results.add(new Result(values));
{code}

By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
we can avoid the unnecessary iteration to compute currentScanResultSize.


2) An example of a wasteful temporary array, is "results" in
RegionScanner.next().

{code}
      results.clear();
      boolean returnResult = nextInternal(limit, metric);

      outResults.addAll(results);
{code}

results then gets copied over to outResults via an addAll(). Not sure why we can directly collect the results in outResults.

3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".


4) Reduce overhead of "size metric" maintained in StoreScanner.next().

{code}
  if (metric != null) {
     HRegion.incrNumericMetric(this.metricNamePrefix + metric,
                               copyKv.getLength());
  }
  results.add(copyKv);
{code}

A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.

5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.


    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436545#comment-13436545 ] 

Lars Hofhansl commented on HBASE-6066:
--------------------------------------

Yeah, uncontended entry into a monitor in Java is pretty cheap, I do not think we need to worry about that.
Especially in the synchronized-calls-synchronized case, because all memory barriers will already be established anyway.

                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465982#comment-13465982 ] 

Lars Hofhansl commented on HBASE-6066:
--------------------------------------

#4 is in 0.94 and 0.96 already. Also see HBASE-6711, which addresses another needless copy of results between ArrayLists.

Another improvement I can see is to have Result maintain an ArrayList<KeyValue> internally, rather than KeyValue[]. That way a bunch of conversions from ArrayList to array can be removed. (I have a patch for that somewhere, but it failed a bunch of tests, so I did not pursue it).
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>             Fix For: 0.96.0
>
>         Attachments: 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthik Ranganathan updated HBASE-6066:
---------------------------------------

    Issue Type: Sub-task  (was: Improvement)
        Parent: HBASE-6922
    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>             Fix For: 0.96.0
>
>         Attachments: 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450821#comment-13450821 ] 

stack commented on HBASE-6066:
------------------------------

[~kannanm] You know who made it work in the first place?  I thought it one of your magicians?  It'd be cool seeing the phabricator back and forth show in JIRA again.  I used to enjoy that show!
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280755#comment-13280755 ] 

Kannan Muthukkaruppan commented on HBASE-6066:
----------------------------------------------

These are just some possible ideas. We probably need to dig deeper with a profiler to look for other improvements. In a quick and dirty hacky experiment I was able to get the single-threaded scan throughput from about 25MB/sec to 39MB/sec with some of these changes. But I am sure there is a lot of opportunity to tighten the code further.
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-6066:
-----------------------------------------

      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)
    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Priority: Minor
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-6066:
-------------------------

      Component/s: Performance
    Fix Version/s: 0.96.0
    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>             Fix For: 0.96.0
>
>         Attachments: 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan reassigned HBASE-6066:
--------------------------------------------

    Assignee: Aurick Qiao  (was: Kannan Muthukkaruppan)
    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Aurick Qiao
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthik Ranganathan reassigned HBASE-6066:
------------------------------------------

    Assignee: Michal Gregorczyk  (was: Aurick Qiao)
    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450722#comment-13450722 ] 

Kannan Muthukkaruppan commented on HBASE-6066:
----------------------------------------------

[~lhofhansl]: Regarding #2, I didn't quite understand the issue you raise. Here's the diff under review for this: https://reviews.facebook.net/D4191. I have added you to the Cc list explicitly on that diff. Please take a look at that.

(Is there a easy way to alert/cc all hbase devs/committers on these given that the JIRA integration with phabricator seems to be broken)? Previously, Cc'ing JIRA would cause appropriate notification into the JIRA.)
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-6066:
-------------------------------

    Attachment: metric-stringbuilder-fix.patch

This patch improved CPU pretty noticeably for my tests. I saw a bunch of CPU waste in StringBuilder.append before the patch, disappeared with it.
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Priority: Minor
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295779#comment-13295779 ] 

Kannan Muthukkaruppan commented on HBASE-6066:
----------------------------------------------

One of our interns (Aurick) is planning to work on this.

Part #3, the incrNumericMetric overhead patch is easier-- it is a slight variant of Todd's patch and should result in fewer AtomicLong operations because it just accumulates in a local variable and updates the counter at the end. It is part of a diff by Michael Chen: https://reviews.facebook.net/D3627#a83f6d09. I think it is better to fork off this part (#4) into its own JIRA and Aurick will work on the others.
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-6066:
-------------------------

    Priority: Critical  (was: Minor)
        Tags: noob
      Labels: noob  (was: )

I upped priority on this so we'll less likely forget about it.  Seems like small changes can make big improvement.  Tagged it noob since the hard part -- the prescription -- has been done by two gentlemen software engineers (Kannan and Todd).
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452502#comment-13452502 ] 

Lars Hofhansl commented on HBASE-6066:
--------------------------------------

Had a super brief look at the patch in phabricator. Looks fine the issue I raised above is addressed by checking whether outResults is empty in RegionScannerImpl.next(...) and creating a temporary array if not.

The change might introduce new performance issues, though. I'll comment on the diff.

                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-6066:
-------------------------

    Attachment: 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch

Forward port this patch from 0.89-fb
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>             Fix For: 0.96.0
>
>         Attachments: 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon reassigned HBASE-6066:
----------------------------------

    Assignee: Todd Lipcon
    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Todd Lipcon
>            Priority: Minor
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466029#comment-13466029 ] 

stack commented on HBASE-6066:
------------------------------

Thanks Lars.  Will keep in mind forward porting.
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>             Fix For: 0.96.0
>
>         Attachments: 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan reassigned HBASE-6066:
--------------------------------------------

    Assignee: Kannan Muthukkaruppan
    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448065#comment-13448065 ] 

Lars Hofhansl commented on HBASE-6066:
--------------------------------------

Looked at #2. Turns out that this actually needed to retain the group of KVs that represent the current row to be used with Filter.filterRow(...)
(But see HBASE-6711, which removes an unnecessary temporary result array from StoreScanner)
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6066) some low hanging read path improvement ideas

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon reassigned HBASE-6066:
----------------------------------

    Assignee:     (was: Todd Lipcon)

oops, did not mean to click "assign to me" :) But I noticed several of the same issues. Will attach a patch I have floating around for one of them.
                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Priority: Minor
>         Attachments: metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira