You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Dave Latham (JIRA)" <ji...@apache.org> on 2009/11/21 01:11:39 UTC

[jira] Created: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Configure scanner buffer in bytes instead of number of rows
-----------------------------------------------------------

Key: HBASE-1996
URL: https://issues.apache.org/jira/browse/HBASE-1996
Project: Hadoop HBase
Issue Type: Improvement
Reporter: Dave Latham
Assignee: Dave Latham
Fix For: 0.21.0

Currently, the default scanner fetches a single row at a time. This makes for very slow scans on tables where the rows are not large. You can change the setting for an HTable instance or for each Scan.

It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot. Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows. Let's change the setting so that it works with a size in bytes, rather than in rows. This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.

Note that the case is very similar to table writes as well. When disabling auto flush, we buffer a list of Put's to commit at once. That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush. If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows. Changing the scan buffer to be configured like the write buffer will make it more consistent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Erik Rozendaal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784724#action_12784724 ] 

Erik Rozendaal commented on HBASE-1996:
---------------------------------------

This feature would be really useful to us. We have a table where the number of columns per row varies greatly. Most of them have less than 100 columns, but some have more than a million. With the current scanner we're pretty much limited to specifying 1-3 rows of caching, which hurts performance in many cases. So being able to specify the size in bytes (with the minimum of one row returned) would be very useful.

I'll try to see if this patch can be applied to our version of HBase (0.20.2) and see how that works.

PS We're also working to reduce the variance of row sizes... >1 million columns is a pain w.r.t. memory usage.


> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795163#action_12795163 ] 

stack commented on HBASE-1996:
------------------------------

@Erik I took a look at the patch.  Could you make it so the 1MB upper bound was not a hard-coding, instead read it from HBaseConfiguration (You don't have to add the value to hbase-default.xml).  I'd default to 10MBs rather than 1MB.  Also, I'm not sure what the changes in HTable do?  The client stops the scan when it hits the hard-coded upper-bound?  Thanks. 

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Erik Rozendaal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795026#action_12795026 ] 

Erik Rozendaal commented on HBASE-1996:
---------------------------------------

For 0.20.3 we could have a configuration parameter for HRegionServer for the scan buffer size. The client will not be able to override this value. It is not optimal, but will work for us and will prevent the current worst-case scan behavior when caching is set to 1 and rows are very small. The scan buffer size could default to 0 (100% backwards compatible) or a more useful value like 128-512 kB or so.

W.r.t. "at least N rows and M bytes" 

I don't think this is very useful (yet). I think it is more useful to be able to specify:

- scan buffer size (for performance)
- allow/disallow partial rows (to prevent OOMs with large rows)

Adding minimum number of rows and/or batch size in number of key values (0.21?) will make the API harder to use and test.

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Latham updated HBASE-1996:
-------------------------------

    Attachment: 1966.patch

Here's a first shot at making this change.

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795086#action_12795086 ] 

Andrew Purtell commented on HBASE-1996:
---------------------------------------

bq. I think it is more useful to be able to specify [...] allow/disallow partial rows (to prevent OOMs with large rows)

Does HBASE-1537 cover this? 

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Erik Rozendaal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Rozendaal updated HBASE-1996:
----------------------------------

    Attachment: 1996-0.20.3-v3.patch

The 1996-0.20.3-v3.patch sets the default limit to "unlimited", so 0.20.3 should have the same scanning behavior as 0.20.2 unless the configuration parameter is set explicitly.

PS The fix-version of this issue is 0.21 and is marked as incompatible change. Maybe the 0.20.3 patch should be moved to a new issue or this issue changed?


> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832653#action_12832653 ] 

stack commented on HBASE-1996:
------------------------------

I made hbase-2214 to do this better in 0.21.

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1996.
--------------------------

    Resolution: Fixed

Committed branch and trunk.  Thanks for the patch lads (Dave and Erik.. in particular Erik, thanks for persisting).

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Erik Rozendaal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Rozendaal updated HBASE-1996:
----------------------------------

    Attachment: 1996-0.20.3.patch

This patch limits the result from a single call to a scanner's next method to one MB. I couldn't get "minimum N rows, minimum M bytes" to work without needing changes in the protocol. So now it is "maximum N rows, maximum M bytes" where M is hardcoded to 1 MB.

This allows me to set a scanner's caching to Integer.MAX_VALUE and not get any OOMs on the region server. Obviously only ~1 MB of data is returned.

Scanning performance is very high (I get 20+ MB/second on my Core2Duo 2.4 GHz laptop going to HBase through a web server... so more like 40+ MB/second on the HBase side).



> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Erik Rozendaal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795579#action_12795579 ] 

Erik Rozendaal commented on HBASE-1996:
---------------------------------------

@Andrew: HBASE-1537 will work pretty well when KeyValues are of similar/predictable size. However, I prefer to be able to set a limit in bytes. This should use give more predictable performance, especially when you have widely varying row/KeyValue sizes.


> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Erik Rozendaal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794822#action_12794822 ] 

Erik Rozendaal commented on HBASE-1996:
---------------------------------------

I've tested this against hbase 0.20.2 (some manual patching was required). As expected, performance with this patch is much more stable when having rows of widely varying sizes. Also, no more out-of-memory errors in the hbase region server or on the client side.

Being able to set caching parameters based on memory usage (bytes) is much nicer than trying to predict row or KeyValue sizes (as used by the new 0.21 scanning API).

It would be really nice if this patch (maybe adjusted to preserve backwards compatibility with the setCaching API) could be included for 0.20.3 and 0.21.

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Latham updated HBASE-1996:
-------------------------------

    Status: Open  (was: Patch Available)

Ryan made a good point that the scanner caching is set to 1 not just for memory concerns, but because if a client takes a long time processing rows, then the scanner might timeout before getting the next batch of rows.

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780856#action_12780856 ] 

Dave Latham commented on HBASE-1996:
------------------------------------

One thing to note in this patch.  The buffer behaves similar to the write case where it always includes at least one row, and doesn't return until it's at least full.  So the value is more a minimum to attempt to fill (as long as more rows are available) rather than a maximum.  Hence, setting it to 0 is equivalent to setting the number of rows to 1.

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Erik Rozendaal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Rozendaal updated HBASE-1996:
----------------------------------

    Attachment: 1996-0.20.3-v2.patch

Second version of patch for 0.20.3 branch. This makes the maximum result size configurable.

However: the client and the server *must* use the same maximum result size, otherwise rows in regions may be skipped. This is because of the way the results of a region scan are reported to the client:

- null: scanning filter stopped processing
- fewer rows returned than requested: end-of-region reached, move on.

The second point is why the HTable modifications are necessary. It is now normal that a region scan will return fewer rows than requested even when the end of the region has not been reached yet. So the client needs to duplicate the region server logic to keep in sync.

I think for 0.21 the result communication to the client should be made more explicit, eg. make a ScannerCallableResult class that contains a status field (MORE_AVAILABLE, SKIP_TO_NEXT_REGION, FILTER_SAID_STOP) as well as the actual rows returned.

I also left the default max result size value at 1 megabyte. In my (admittedly limited) testing using just my laptop without a real network a size of 256-1024 kB seems to be optimal.

Here are my test results:

||max scanner result size (bytes)||MB/s scanned with rows avg 750 bytes||MB/s scanned with rows avg 175 bytes||
|1024|3.23|1.99|
|2048|5.14|3.10|
|4096|7.34|4.67|
|8192|10.95|6.50|
|16384|16.15|8.30|
|32768|18.96|8.50|
|65536|20.42|9.16|
|131072|20.93|9.06|
|262144|21.48|9.49|
|524288|22.34|9.37|
|1048576|22.50|8.91|
|2097152|20.91|8.03|
|4194304|19.86|7.35|
|8388608|17.89|6.83|
|16777216|17.63|6.98|

Scanner caching was set to Integer.MAX_VALUE (unlimited number of rows). MB/s are measured going through a web server, so raw HBase speed is probably double or higher. Obviously a real cluster test should be done to measure real performance and otherwise tune the max result size.


> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Latham updated HBASE-1996:
-------------------------------

    Status: Patch Available  (was: Open)

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794961#action_12794961 ] 

stack commented on HBASE-1996:
------------------------------

It looks like this patch as is can't go into 0.20.3 because it changes the interfaces making it so we won't be able to do rolling upgrade?  (HRI#next parameter changed from int to long for size of buffer instead of number of rows).  I think adding a new method, one that takes a size also breaks rolling updates (I'm not sure about this one).

.bq ...at least X rows and at least Y bytes

This could work if it was done cleanly, IMO.

If I had to choose, looking at the patch, size of result rather than number of rows seems the better idea.

How should we proceed.   Seems like a clean addition can be made for 0.21 but what to do for 0.20.3 timeframe?

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796376#action_12796376 ] 

stack commented on HBASE-1996:
------------------------------

Patch looks good.  Testing now...

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794863#action_12794863 ] 

Dave Latham commented on HBASE-1996:
------------------------------------

Perhaps if we supported both settings i.e. buffer at least X rows and at least Y bytes.  Too much complexity?  We could still default to min 1 row and 0 bytes to minimize the chance of a scanner timeout if needed.

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796054#action_12796054 ] 

stack commented on HBASE-1996:
------------------------------

Erik: Thanks for making the size configurable.  Thanks for explaining why client needs to match server.

Can you make it so that unless an explicit size has been set, the behavior is that things work as they do now?  So that this sizing of data only cuts in if you set an explicit value in hbase.client.scanner.max.result.size?  So the behavior is as it is now, pre-patch, unless you change hbase.client.scanner.max.result.size from its default of -1?

Once committed, we should file the issue you suggest for 0.21 where we do better communication of state between client and server during scans.

> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow scans on tables where the rows are not large.  You can change the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot.  Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows.  Let's change the setting so that it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush, we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush.  If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows.  Changing the scan buffer to be configured like the write buffer will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.