You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2010/02/05 01:30:27 UTC

[jira] Updated: (HBASE-2180) read performance from synchronizing hfile.fddatainputstream

     [ https://issues.apache.org/jira/browse/HBASE-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2180:
-------------------------

    Attachment: 2180.patch

This patch has gets do preads fetching blocks and uses the old seek+read for scans.

Patch removes the old HFile.Reader.getScanner methods and replaces both with a getScanner that takes two arguments -- whether to cache blocks read and whether to use pread or not pulling in the block.  I got rid of the old getScanners to force all getScanners to be explicit about what they want regards caching and pread.

This patch does not include tests.  Its hard to test for this performance change.

A further improvement would recognize short scans -- i.e. scans that are < an hfile block size.  In this case, we'd want to pread rather than seek+scan (especially so when scan one row replaces get)



> read performance from synchronizing hfile.fddatainputstream
> -----------------------------------------------------------
>
>                 Key: HBASE-2180
>                 URL: https://issues.apache.org/jira/browse/HBASE-2180
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: 2180.patch
>
>
> deep in the HFile read path, there is this code:
>     synchronized (in) {
>       in.seek(pos);
>       ret = in.read(b, off, n);
>     }
> this makes it so that only 1 read per file per thread is active. this prevents the OS and hardware from being able to do IO scheduling by optimizing lots of concurrent reads. 
> We need to either use a reentrant API (pread may be partially reentrant according to Todd) or use multiple stream objects, 1 per scanner/thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.