You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/01/10 07:15:59 UTC

[jira] Created: (HBASE-1118) Scanner setup takes too long

Scanner setup takes too long
----------------------------

                 Key: HBASE-1118
                 URL: https://issues.apache.org/jira/browse/HBASE-1118
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack


posix4 and dj_ryan are on about scanner setups take too long.  Use case is fetch of a 100 - 1000 rows at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1118) Scanner setup takes too long

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666365#action_12666365 ] 

Jonathan Gray commented on HBASE-1118:
--------------------------------------

+1 on solving this as part of move to TFile.  We should definitely be sharing both indexes and Readers across all gets and scanners.  Not worth significant effort to improving in 0.19.X

> Scanner setup takes too long
> ----------------------------
>
>                 Key: HBASE-1118
>                 URL: https://issues.apache.org/jira/browse/HBASE-1118
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> posix4 and dj_ryan are on about scanner setups take too long.  Use case is fetch of a 100 - 1000 rows at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1118) Scanner setup takes too long

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-1118:
-------------------------------


Ideally I'd like to query hbase during the context of a web hit.  Ideally web server side render time is < 250ms total, and data access is a major portion of that.  Meaning we'd like to get the data back in something like 100ms.  For 1000 rows that should be in the same region, or maybe 2, hopefully it's possible.

> Scanner setup takes too long
> ----------------------------
>
>                 Key: HBASE-1118
>                 URL: https://issues.apache.org/jira/browse/HBASE-1118
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> posix4 and dj_ryan are on about scanner setups take too long.  Use case is fetch of a 100 - 1000 rows at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1118) Scanner setup takes too long

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666329#action_12666329 ] 

stack commented on HBASE-1118:
------------------------------

Looking at this a little, the setup of the Scanner is taking up a good portion of the time returning values.  Profiler shows its taking 30-40% of setup time fetching 100 (small cell) rows.  To verify the profiler findings, I resorted to system.out and that seemed to show similiar figures (though maybe its more than 30-40% since my system.out was measuring serverside while time was taken on client side after rows had been fetched and emitted on console).

Every time we open a scanner, it opens a Reader per covered HStoreFiles.  Opening a Reader currently means opening the data file and its index plus reading in the index into memory.  This latter seemed to be taking the bulk of the open time in profiler.

There are a few things we can do here but probably not till tfile time.

1. We already have an open Reader for every HStoreFile.  Scanners should be able to access already-opened Reader indices rather than read in its own.  Will save on startup time and on heap (Indexes are private in current MapFile).
2. A smarter blockcache would let Scanners use already loaded blocks.  Chatting with jgray, since we can give tfile a Stream, the Stream we hand it can be smartened up so it goes to a blockcache first and if no block, only then to hdfs.





> Scanner setup takes too long
> ----------------------------
>
>                 Key: HBASE-1118
>                 URL: https://issues.apache.org/jira/browse/HBASE-1118
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> posix4 and dj_ryan are on about scanner setups take too long.  Use case is fetch of a 100 - 1000 rows at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1118) Scanner setup takes too long

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662624#action_12662624 ] 

stack commented on HBASE-1118:
------------------------------

Posix is talking of scanner setup should take < 200ms.

> Scanner setup takes too long
> ----------------------------
>
>                 Key: HBASE-1118
>                 URL: https://issues.apache.org/jira/browse/HBASE-1118
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> posix4 and dj_ryan are on about scanner setups take too long.  Use case is fetch of a 100 - 1000 rows at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1118) Scanner setup takes too long

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1118:
-------------------------

         Priority: Critical  (was: Major)
    Fix Version/s: 0.20.0

> Scanner setup takes too long
> ----------------------------
>
>                 Key: HBASE-1118
>                 URL: https://issues.apache.org/jira/browse/HBASE-1118
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> posix4 and dj_ryan are on about scanner setups take too long.  Use case is fetch of a 100 - 1000 rows at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1118) Scanner setup takes too long

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1118.
--------------------------

    Resolution: Fixed

hbase-61 fixes scanner setup.. still not fast enough but at least ten times faster than the 200ms posix's wanted in the above

> Scanner setup takes too long
> ----------------------------
>
>                 Key: HBASE-1118
>                 URL: https://issues.apache.org/jira/browse/HBASE-1118
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> posix4 and dj_ryan are on about scanner setups take too long.  Use case is fetch of a 100 - 1000 rows at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.