You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "LN (JIRA)" <ji...@apache.org> on 2008/06/20 07:25:45 UTC

[jira] Created: (HBASE-700) hbase.io.index.interval need be configuratable in column family

hbase.io.index.interval need be configuratable in column family 
----------------------------------------------------------------

                 Key: HBASE-700
                 URL: https://issues.apache.org/jira/browse/HBASE-700
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: regionserver
    Affects Versions: 0.1.2
            Reporter: LN
            Priority: Minor


setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.

in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.

so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HBASE-700) hbase.io.index.interval need be configuratable in column family

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-700 started by Andrew Purtell.

> hbase.io.index.interval need be configuratable in column family 
> ----------------------------------------------------------------
>
>                 Key: HBASE-700
>                 URL: https://issues.apache.org/jira/browse/HBASE-700
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.2
>            Reporter: LN
>            Assignee: Andrew Purtell
>            Priority: Minor
>
> setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.
> in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.
> so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-700) hbase.io.index.interval need be configuratable in column family

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-700.
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.2.0

Resolved as part of HBASE-62.  Thanks for the patch Andrew.

> hbase.io.index.interval need be configuratable in column family 
> ----------------------------------------------------------------
>
>                 Key: HBASE-700
>                 URL: https://issues.apache.org/jira/browse/HBASE-700
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.2
>            Reporter: LN
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.2.0
>
>
> setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.
> in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.
> so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-700) hbase.io.index.interval need be configuratable in column family

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607630#action_12607630 ] 

stack commented on HBASE-700:
-----------------------------

LN: We could do it that way.  Downside is that it would introduce a second location for table/column configs. apart table descriptor

Andrew: All sounds great.  Doing all in hbase-42 sounds good (rather than as a new issue).  I like your description of how the read-only would work.

> hbase.io.index.interval need be configuratable in column family 
> ----------------------------------------------------------------
>
>                 Key: HBASE-700
>                 URL: https://issues.apache.org/jira/browse/HBASE-700
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.2
>            Reporter: LN
>            Priority: Minor
>
> setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.
> in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.
> so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-700) hbase.io.index.interval need be configuratable in column family

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607490#action_12607490 ] 

Andrew Purtell commented on HBASE-700:
--------------------------------------

Hi Stack.

HBASE-62 is a generalization of the changes to HTableDescriptor under consideration. So actually it makes sense to implement this and then have the regionservers watch for certain known key-value pairs that specify table or column store parameters. (Commenting on this issue specifically I don't think JSON is warranted as an encoding for table and column metadata. Simple single value strings for keys and values should be enough. However user metadata could be formatted however the user desires.)

HBASE-34 could be specified at the table level. Regarding this issue I read Bryan's concerns about exposing tuning parameters but suggest that people who are tuning parameters exposed as proposed should know what they are doing, and, if not, will soon learn better. 

HBASE-43 seems pretty trivial, in a sense: In order to apply HTableDescriptor updates, the client would need to disable the table, tell the master to update the descriptor, and then reenable the table. So all of the pending edits would be flushed for the disable. Then, when reenabled, the regionservers could note the read only attribute and simply reject edits to the columns, and then both they and whatever mapreduce job running over the mapfiles could coexist happily.

I think all of this could be rolled into one change set. Want to tie these all to HBASE-42, or open a new JIRA? 

> hbase.io.index.interval need be configuratable in column family 
> ----------------------------------------------------------------
>
>                 Key: HBASE-700
>                 URL: https://issues.apache.org/jira/browse/HBASE-700
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.2
>            Reporter: LN
>            Priority: Minor
>
> setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.
> in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.
> so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-700) hbase.io.index.interval need be configuratable in column family

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607230#action_12607230 ] 

Andrew Purtell commented on HBASE-700:
--------------------------------------

This could be rolled in to HBASE-42. Are there other candidates for column family metadata that would be useful and meaningful to add to HTableDescriptor? 

> hbase.io.index.interval need be configuratable in column family 
> ----------------------------------------------------------------
>
>                 Key: HBASE-700
>                 URL: https://issues.apache.org/jira/browse/HBASE-700
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.2
>            Reporter: LN
>            Priority: Minor
>
> setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.
> in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.
> so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-700) hbase.io.index.interval need be configuratable in column family

Posted by "LN (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607488#action_12607488 ] 

LN commented on HBASE-700:
--------------------------

thanks paying attention for this issue. 

but i think it may not cause a modification in HTableDescriptor or HColumnDescriptor, for these low-layer storage tunings.

i'd suggest a conf parameter "hbase.io.index.interval.TableA.ColumnB" for setting index interval for column 'ColumnB' of table 'TableA'. 

> hbase.io.index.interval need be configuratable in column family 
> ----------------------------------------------------------------
>
>                 Key: HBASE-700
>                 URL: https://issues.apache.org/jira/browse/HBASE-700
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.2
>            Reporter: LN
>            Priority: Minor
>
> setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.
> in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.
> so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-700) hbase.io.index.interval need be configuratable in column family

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607249#action_12607249 ] 

stack commented on HBASE-700:
-----------------------------

Below is for Andrew:

in memory, though not yet implemented, could be a candidate (Catalog tables should be in memory by default methinks).

Should hbase-62 be tied to hbase-42?

HBASE-43?

HBASE-34 at the table rather than column level?

> hbase.io.index.interval need be configuratable in column family 
> ----------------------------------------------------------------
>
>                 Key: HBASE-700
>                 URL: https://issues.apache.org/jira/browse/HBASE-700
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.2
>            Reporter: LN
>            Priority: Minor
>
> setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.
> in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.
> so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-700) hbase.io.index.interval need be configuratable in column family

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reassigned HBASE-700:
------------------------------------

    Assignee: Andrew Purtell

> hbase.io.index.interval need be configuratable in column family 
> ----------------------------------------------------------------
>
>                 Key: HBASE-700
>                 URL: https://issues.apache.org/jira/browse/HBASE-700
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.2
>            Reporter: LN
>            Assignee: Andrew Purtell
>            Priority: Minor
>
> setting parameter hbase.io.index.interval to smaller can improve hbase reading performance significantly, esp. in large value size column families. however, small hbase.io.index.interval cause more memory usage, because all index will read into memory when loading a mapfile.
> in my test env, i set hbase.io.index.interval to 1, after inserting about 3M samll size records to a table(about 1.5G in hadoop file), the regionserver throws OOME.  then i found total size of  map file index  is 350M.  however, i can't adjust  hbase.io.index.interval to a larger one, like 32, because other big cell size tables need it be 1.
> so, i think make hbase.io.index.interval a column family property should be very important for performance tuning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.