You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/01/20 01:12:59 UTC

[jira] Created: (HBASE-1141) Fetching large numbers of columns is slow outside of HDFS

Fetching large numbers of columns is slow outside of HDFS
---------------------------------------------------------

                 Key: HBASE-1141
                 URL: https://issues.apache.org/jira/browse/HBASE-1141
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Jonathan Gray
             Fix For: 0.20.0


While working on a Cell cache, we have found during random-read tests that the number of columns has an enormous impact on performance.  Accounting for increased HDFS access time, there is still a great deal of time being spent coming out of the Region and then across the wire to HTable.

Erik Holstad has done this testing and will post some of his results here when completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1141) Fetching large numbers of columns is slow outside of HDFS

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665288#action_12665288 ] 

Jonathan Gray commented on HBASE-1141:
--------------------------------------

We're looking into this because our most common query is:  get_all_columns(table, row, family)

When this is a smaller number of columns, our random access times are on the order of 2ms.  But if there are a few thousand columns in the family, this can take >100ms.

Certainly there are some inefficiencies in a query like this because you must check all stores, but even when serving out of memory (the new cache Erik is designing) there is a significant performance hit to having many columns.

Erik has done some timing and can post what he has found.

> Fetching large numbers of columns is slow outside of HDFS
> ---------------------------------------------------------
>
>                 Key: HBASE-1141
>                 URL: https://issues.apache.org/jira/browse/HBASE-1141
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> While working on a Cell cache, we have found during random-read tests that the number of columns has an enormous impact on performance.  Accounting for increased HDFS access time, there is still a great deal of time being spent coming out of the Region and then across the wire to HTable.
> Erik Holstad has done this testing and will post some of his results here when completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1141) Fetching large numbers of columns is slow outside of HDFS

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray resolved HBASE-1141.
----------------------------------

    Resolution: Won't Fix

Further experimentation showed there is a 40ms delay when a client is on the same node as the regionserver, for certain payloads.

Since this is not a general issue and might even be tied to my particular environment, will open another issue for that specific issue.

> Fetching large numbers of columns is slow outside of HDFS
> ---------------------------------------------------------
>
>                 Key: HBASE-1141
>                 URL: https://issues.apache.org/jira/browse/HBASE-1141
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> While working on a Cell cache, we have found during random-read tests that the number of columns has an enormous impact on performance.  Accounting for increased HDFS access time, there is still a great deal of time being spent coming out of the Region and then across the wire to HTable.
> Erik Holstad has done this testing and will post some of his results here when completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.