You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "ryan rawson (JIRA)" <ji...@apache.org> on 2009/06/23 23:01:10 UTC

[jira] Created: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

TIF needs to be able to set scanner caching size for smaller row tables & performance
-------------------------------------------------------------------------------------

                 Key: HBASE-1576
                 URL: https://issues.apache.org/jira/browse/HBASE-1576
             Project: Hadoop HBase
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.20.0
            Reporter: ryan rawson
            Priority: Critical
             Fix For: 0.20.0


TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1576:
----------------------------------

    Status: Patch Available  (was: Open)

> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1576:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed after +1 from Ryan up on IRC

> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723381#action_12723381 ] 

Andrew Purtell commented on HBASE-1576:
---------------------------------------

Should the default also be upped to something like 30 as before, or even 100? 

> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723381#action_12723381 ] 

Andrew Purtell edited comment on HBASE-1576 at 6/23/09 5:29 PM:
----------------------------------------------------------------

Should the default also be upped to something like 30 as before, or even 100? I mean the default as set by TableMapReduceUtil, not in hbase-default.xml. 

      was (Author: apurtell):
    Should the default also be upped to something like 30 as before, or even 100? 
  
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723383#action_12723383 ] 

Jean-Daniel Cryans commented on HBASE-1576:
-------------------------------------------

bq. Should the default also be upped to something like 30 as before, or even 100? I mean the default as set by TableMapReduceUtil, not in hbase-default.xml.

Every time I had to change it that was for a MR job where I was doing heavy processing on each row. So -1 on that.

> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reassigned HBASE-1576:
-------------------------------------

    Assignee: Andrew Purtell

> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723327#action_12723327 ] 

Andrew Purtell commented on HBASE-1576:
---------------------------------------

The HBaseConfiguration object is created from the JobConf (TableInputFormat.java, line ~58), so isn't this sufficient?

{code}
JobConf job = new JobConf();
// ... 
job.set("hbase.client.scanner.caching", "100");
// ... 
{code}

No problem to make a convenience method, though...


> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723420#action_12723420 ] 

stack commented on HBASE-1576:
------------------------------

+1 on patch.  Ryan.... this work for you?

> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1576:
----------------------------------

    Attachment: HBASE-1576.patch

> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner caching size for smaller row tables & performance

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723396#action_12723396 ] 

Andrew Purtell commented on HBASE-1576:
---------------------------------------

Ok, then the patch stands as is. 

> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-1576
>                 URL: https://issues.apache.org/jira/browse/HBASE-1576
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1).  When each row is processed very fast and is small, this limits the overall performance.  By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.