You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2012/11/18 07:40:12 UTC

[jira] [Created] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Lars Hofhansl created HBASE-7180:
------------------------------------

             Summary: RegionScannerImpl.next() is inefficient.
                 Key: HBASE-7180
                 URL: https://issues.apache.org/jira/browse/HBASE-7180
             Project: HBase
          Issue Type: Bug
            Reporter: Lars Hofhansl


We just came across a special scenario.

For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.

In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
# Starts a RegionOperation
# Increments the request count
# set the current read point on a thread local (because generally each call could come from a different thread)
# Finally does the next on its StoreScanner(s)
# Ends the RegionOperation

When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.

Not sure what to do about this, really. Opening this issue for discussion.

One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.

Are there better/cleaner ways?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510182#comment-13510182 ] 

stack commented on HBASE-7180:
------------------------------

+1
                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt, 7180-0.94-v2.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-7180 started by Lars Hofhansl.

> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt, 7180-0.94-v2.txt, 7180-0.94-v3.txt, 7180-0.96-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-7180:
---------------------------------

    Attachment: 7180-0.94-v3.txt

Just changes a comment slightly
                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt, 7180-0.94-v2.txt, 7180-0.94-v3.txt, 7180-0.96-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499911#comment-13499911 ] 

Lars Hofhansl commented on HBASE-7180:
--------------------------------------

With this patch a RegionObserver wrapping a RegionScanner in preScannerOpen can handle the readpoint and start/close region operation itself and then call the cheaper next() on the wrapped scanner when needed.

                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508476#comment-13508476 ] 

Lars Hofhansl commented on HBASE-7180:
--------------------------------------

How about another approach:
# introduce a RawRegionScanner interface, which extends RegionScanner.
# RawRegionScanner has all the additional methods on it we need.
# Add a getRawScannner to the RegionScanner interface.
# RegionScannerImpl would then implement RawRegionScanner.

To the coprocessor framework we'd still hand a RegionScanner, but now the coprocessor can get the raw scanner via getRawScanner(). The RegionScannerImpl's implementation of getRawScanner() just returns "this".
Is that better? Or does anybody have another a cleaner idea?

closeRegionOperation and startRegionOperation would still need to be public, so that coprocessors can start/stop region operations.

                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-7180:
---------------------------------

    Attachment: 7180-0.94-v2.txt

Updated patch.
Has nextRaw(...) instead of nextInternal(...).

                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt, 7180-0.94-v2.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-7180:
---------------------------------

    Attachment: 7180-0.96-v1.txt

And a 0.96 version
                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt, 7180-0.94-v2.txt, 7180-0.94-v3.txt, 7180-0.96-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-7180:
---------------------------------

    Attachment: 7180-0.94-SKETCH.txt

Something nasty like this.
There must be a cleaner way.
                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-7180:
---------------------------------

    Attachment: 7180-0.94-v1.txt

Slightly better patch.
Also changes the call in RegionServer to use the cheaper version of next().

Looking around at the code, we can also replace all the calls from the AggregationImplementation to use this cheaper next() method.

                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510102#comment-13510102 ] 

Lars Hofhansl commented on HBASE-7180:
--------------------------------------

Talked to Stack and Gregory C. offline. Will use the initial approach with a better name for the "internal" next method.
                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-7180:
---------------------------------

    Fix Version/s: 0.94.4
                   0.96.0
         Assignee: Lars Hofhansl
    
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt, 7180-0.94-v2.txt, 7180-0.94-v3.txt, 7180-0.96-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501355#comment-13501355 ] 

stack commented on HBASE-7180:
------------------------------

getReadPt has to be public?

nextInternal has to be public too?

closeRegionOperation and startRegionOperation too?

So CPs can get at them?

You add KeyValue to RegionScanner when it did not need it previous.  You have to?

And a nextInternal in an Interface like RegionScanner seems wrong?

I am for speedup but as you say, this one is ugly

                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501375#comment-13501375 ] 

Lars Hofhansl commented on HBASE-7180:
--------------------------------------

Yeah I don't like it either. We somehow need to expose more internals to coprocessors in a clean way.

* KeyValue already is needed for RegionScanner, since it extends internal scanner.
* start/closeRegionOperation should be available to coprocessors anyway (I think). Otherwise it is hard to implement these types of things in coprocessors.
* I mainly do not like nextInternal on the interface. Is there a better way to expose the inner workings of RegionScannerImpl to avoid expensive setup at each iteration?

Another option is to keep the RegionScanner interface as it, and just make these methods public in RegionScannerImpl. A coprocessor can then cast the RegionScanner to RegionScannerImpl and access the stuff it needs.

                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7180) RegionScannerImpl.next() is inefficient.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-7180:
---------------------------------

    Status: Patch Available  (was: In Progress)
    
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt, 7180-0.94-v2.txt, 7180-0.94-v3.txt, 7180-0.96-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into HBase via coprocessors. One method is to wrap RegionScanner in coprocessor hooks and then do processing in the hook to avoid returning a lot of data to the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's next() does not "know" that it is called this way is still does all of this on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of sorts, so that all this overhead can be avoided. The coprocessor could call the regular next() methods once and then just call the cheaper internal version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira