You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2013/06/05 11:14:21 UTC
[jira] [Commented] (HBASE-8691) High-Throughput Streaming Scan API

    [ https://issues.apache.org/jira/browse/HBASE-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675729#comment-13675729 ] 

stack commented on HBASE-8691:
------------------------------

Thanks for looking into this Sandy.

Now come the dumb questions.

You did it as servlet just because this was easiest way of putting up a new socket on a regionserver over which you could do this new streaming protocol?

Similarily, regionserver already has an Http Server instance to which we could mount the new servlet but it was just expediency that has you create the Server in StreamHRegionServer?

(Pardon the dumb assertions above -- just trying to make sure I can grok better what is going on)

I need to measure what happens when we put our new framing of results -- where we send a pb of metadata followed by blocks of KVs -- over your stream.  My guess is we should see same speedup (only using blocks of kvs, we can have compressed/prefix-encoded blocks of kvs on the wire) even though there will be some "stutter" while we compose the cellblocks server-side.  Hopefully the stutter won't be noticed -- as long as we keep the stream filled w/ data.

This is great.
                
> High-Throughput Streaming Scan API
> ----------------------------------
>
>                 Key: HBASE-8691
>                 URL: https://issues.apache.org/jira/browse/HBASE-8691
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.95.0
>            Reporter: Sandy Pratt
>              Labels: perfomance, scan
>         Attachments: HRegionServlet.java, README.txt, RecordReceiver.java, ScannerTest.java, StreamHRegionServer.java, StreamReceiverDirect.java, StreamServletDirect.java
>
>
> I've done some working testing various ways to refactor and optimize Scans in HBase, and have found that performance can be dramatically increased by the addition of a streaming scan API.  The attached code constitutes a proof of concept that shows performance increases of almost 4x in some workloads.
> I'd appreciate testing, replication, and comments.  If the approach seems viable, I think such an API should be built into some future version of HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira