You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "ryan rawson (JIRA)" <ji...@apache.org> on 2009/06/23 23:01:10 UTC
[jira] Created: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
TIF needs to be able to set scanner caching size for smaller row tables & performance
-------------------------------------------------------------------------------------
Key: HBASE-1576
URL: https://issues.apache.org/jira/browse/HBASE-1576
Project: Hadoop HBase
Issue Type: Bug
Components: mapred
Affects Versions: 0.20.0
Reporter: ryan rawson
Priority: Critical
Fix For: 0.20.0
TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell updated HBASE-1576:
----------------------------------
Status: Patch Available (was: Open)
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1576:
-------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Committed after +1 from Ryan up on IRC
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723381#action_12723381 ]
Andrew Purtell commented on HBASE-1576:
---------------------------------------
Should the default also be upped to something like 30 as before, or even 100?
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HBASE-1576) TIF needs to be able to
set scanner caching size for smaller row tables & performance
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723381#action_12723381 ]
Andrew Purtell edited comment on HBASE-1576 at 6/23/09 5:29 PM:
----------------------------------------------------------------
Should the default also be upped to something like 30 as before, or even 100? I mean the default as set by TableMapReduceUtil, not in hbase-default.xml.
was (Author: apurtell):
Should the default also be upped to something like 30 as before, or even 100?
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723383#action_12723383 ]
Jean-Daniel Cryans commented on HBASE-1576:
-------------------------------------------
bq. Should the default also be upped to something like 30 as before, or even 100? I mean the default as set by TableMapReduceUtil, not in hbase-default.xml.
Every time I had to change it that was for a MR job where I was doing heavy processing on each row. So -1 on that.
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell reassigned HBASE-1576:
-------------------------------------
Assignee: Andrew Purtell
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723327#action_12723327 ]
Andrew Purtell commented on HBASE-1576:
---------------------------------------
The HBaseConfiguration object is created from the JobConf (TableInputFormat.java, line ~58), so isn't this sufficient?
{code}
JobConf job = new JobConf();
// ...
job.set("hbase.client.scanner.caching", "100");
// ...
{code}
No problem to make a convenience method, though...
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Priority: Critical
> Fix For: 0.20.0
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723420#action_12723420 ]
stack commented on HBASE-1576:
------------------------------
+1 on patch. Ryan.... this work for you?
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell updated HBASE-1576:
----------------------------------
Attachment: HBASE-1576.patch
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1576) TIF needs to be able to set scanner
caching size for smaller row tables & performance
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723396#action_12723396 ]
Andrew Purtell commented on HBASE-1576:
---------------------------------------
Ok, then the patch stands as is.
> TIF needs to be able to set scanner caching size for smaller row tables & performance
> -------------------------------------------------------------------------------------
>
> Key: HBASE-1576
> URL: https://issues.apache.org/jira/browse/HBASE-1576
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Andrew Purtell
> Priority: Critical
> Fix For: 0.20.0
>
> Attachments: HBASE-1576.patch
>
>
> TIF goes with the default scanner caching size (1). When each row is processed very fast and is small, this limits the overall performance. By setting a higher scanner caching level you can achieve 100x+ the performance with the exact same map-reduce and table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.