You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2012/05/10 00:33:52 UTC

[jira] [Created] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers

Todd Lipcon created HBASE-5979:
----------------------------------

             Summary: Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
                 Key: HBASE-5979
                 URL: https://issues.apache.org/jira/browse/HBASE-5979
             Project: HBase
          Issue Type: Improvement
          Components: performance, regionserver
            Reporter: Todd Lipcon


Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets and scans. For gets, we use the positional read API (aka "pread") and for scans we use a synchronized block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple gets can proceed at the same time. The advantage of seek+read for scans is that the datanode starts to send the entire rest of the HDFS block, rather than just the single hfile block necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans since you get a strong pipelining effect.

However, in a multi-threaded case where there are multiple scans (including scans which are actually part of compactions), the seek+read strategy falls apart, since only one scanner may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode side, and we get none of the earlier-mentioned advantages.

In one test, I switched scans to always use pread, and saw a 5x improvement in throughput of the YCSB scan-only workload, since it previously was completely blocked by contention on the DFSIS lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271890#comment-13271890 ] 

Todd Lipcon commented on HBASE-5979:
------------------------------------

My thinking is that the solution is something like this:

When any scanner starts, it begins by using the "pread" API for the first N hfile blocks it reads. This allows short scans, which can often fall entirely within one or two HFile blocks, to avoid the read amplification of doing a DFSInputStream seek.

After a scanner has read several blocks from an HFile, it switches over to the seek+read mode. However, it does this with its *own* input stream. This way, all of the pre-buffering that happens through the HDFS layer will benefit it, and it doesn't have to contend with other scans. This should improve performance of long scans in the presence of contention (eg scans + compactions or multiple longer scans within the same region). The actual input streams would thus become owned by the individual HFileScanners.

Not sure if I'll have time to prototype a patch for this any time soon, but happy to help review ideas.
                
> Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-5979
>                 URL: https://issues.apache.org/jira/browse/HBASE-5979
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance, regionserver
>            Reporter: Todd Lipcon
>
> Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets and scans. For gets, we use the positional read API (aka "pread") and for scans we use a synchronized block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple gets can proceed at the same time. The advantage of seek+read for scans is that the datanode starts to send the entire rest of the HDFS block, rather than just the single hfile block necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans since you get a strong pipelining effect.
> However, in a multi-threaded case where there are multiple scans (including scans which are actually part of compactions), the seek+read strategy falls apart, since only one scanner may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode side, and we get none of the earlier-mentioned advantages.
> In one test, I switched scans to always use pread, and saw a 5x improvement in throughput of the YCSB scan-only workload, since it previously was completely blocked by contention on the DFSIS lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280634#comment-13280634 ] 

Kannan Muthukkaruppan commented on HBASE-5979:
----------------------------------------------

Todd: If we always use positional reads, we don't the benefit of HDFS sending the rest of the HDFS block, correct? So I didn't quite catch your recent suggestion. Did you mean, issue positional reads, but explicitly read a much larger chunk (in the Scan case) than just   the current block?
                
> Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-5979
>                 URL: https://issues.apache.org/jira/browse/HBASE-5979
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance, regionserver
>            Reporter: Todd Lipcon
>
> Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets and scans. For gets, we use the positional read API (aka "pread") and for scans we use a synchronized block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple gets can proceed at the same time. The advantage of seek+read for scans is that the datanode starts to send the entire rest of the HDFS block, rather than just the single hfile block necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans since you get a strong pipelining effect.
> However, in a multi-threaded case where there are multiple scans (including scans which are actually part of compactions), the seek+read strategy falls apart, since only one scanner may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode side, and we get none of the earlier-mentioned advantages.
> In one test, I switched scans to always use pread, and saw a 5x improvement in throughput of the YCSB scan-only workload, since it previously was completely blocked by contention on the DFSIS lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280640#comment-13280640 ] 

Todd Lipcon commented on HBASE-5979:
------------------------------------

Hey Kannan,

Sorry, let me elaborate on that suggestion:

The idea is to make a new FSReader implementation, which only has one API. That API would look like the current positional read call (i.e take a position and length).

Internally, it would have a pool of cached DFSInputStreams, and remember the position for each of them. Each of the input streams would be referencing the same file. When a read request comes in, it is matched against the pooled streams: if it is within N bytes forward from the current position of one of the streams, then a seek and read would be issued, synchronized on that stream. Otherwise, any random stream would be chosen and a position read would be chosen. Separately, we can track the last N positional reads: if we detect a sequential pattern in the position reads, we can take one of the pooled input streams and seek to the next predicted offset, so that future reads get the sequential benefit.
                
> Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-5979
>                 URL: https://issues.apache.org/jira/browse/HBASE-5979
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance, regionserver
>            Reporter: Todd Lipcon
>
> Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets and scans. For gets, we use the positional read API (aka "pread") and for scans we use a synchronized block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple gets can proceed at the same time. The advantage of seek+read for scans is that the datanode starts to send the entire rest of the HDFS block, rather than just the single hfile block necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans since you get a strong pipelining effect.
> However, in a multi-threaded case where there are multiple scans (including scans which are actually part of compactions), the seek+read strategy falls apart, since only one scanner may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode side, and we get none of the earlier-mentioned advantages.
> In one test, I switched scans to always use pread, and saw a 5x improvement in throughput of the YCSB scan-only workload, since it previously was completely blocked by contention on the DFSIS lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276018#comment-13276018 ] 

Todd Lipcon commented on HBASE-5979:
------------------------------------

Kannan: I tried to work on this a bit in my spare time, but didn't get very far. So if FB folks have cycles to work on it, that would be awesome!

I think one route is to do like I suggested above and have the StoreFileScanners hold a DFSInputStream. Another option would be to make a wrapper FileSystem (or FSReader) which pools a few streams. Then change the scanners to always issue positional reads, and have the wrapper code look for any stream which is already seeked to the right position (or just before the right position). The advantage of this technique is that we'd end up getting the same sequential read benefit, even if the user was issuing normal get() calls in ascending row order.
                
> Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-5979
>                 URL: https://issues.apache.org/jira/browse/HBASE-5979
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance, regionserver
>            Reporter: Todd Lipcon
>
> Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets and scans. For gets, we use the positional read API (aka "pread") and for scans we use a synchronized block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple gets can proceed at the same time. The advantage of seek+read for scans is that the datanode starts to send the entire rest of the HDFS block, rather than just the single hfile block necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans since you get a strong pipelining effect.
> However, in a multi-threaded case where there are multiple scans (including scans which are actually part of compactions), the seek+read strategy falls apart, since only one scanner may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode side, and we get none of the earlier-mentioned advantages.
> In one test, I switched scans to always use pread, and saw a 5x improvement in throughput of the YCSB scan-only workload, since it previously was completely blocked by contention on the DFSIS lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13275996#comment-13275996 ] 

Kannan Muthukkaruppan commented on HBASE-5979:
----------------------------------------------

Todd: Nice catch! Your suggestion makes sense.
                
> Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-5979
>                 URL: https://issues.apache.org/jira/browse/HBASE-5979
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance, regionserver
>            Reporter: Todd Lipcon
>
> Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets and scans. For gets, we use the positional read API (aka "pread") and for scans we use a synchronized block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple gets can proceed at the same time. The advantage of seek+read for scans is that the datanode starts to send the entire rest of the HDFS block, rather than just the single hfile block necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans since you get a strong pipelining effect.
> However, in a multi-threaded case where there are multiple scans (including scans which are actually part of compactions), the seek+read strategy falls apart, since only one scanner may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode side, and we get none of the earlier-mentioned advantages.
> In one test, I switched scans to always use pread, and saw a 5x improvement in throughput of the YCSB scan-only workload, since it previously was completely blocked by contention on the DFSIS lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira