You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/09/10 20:38:57 UTC

[jira] Created: (HBASE-1823) Ability for Scanners to bypass the block cache

Ability for Scanners to bypass the block cache
----------------------------------------------

                 Key: HBASE-1823
                 URL: https://issues.apache.org/jira/browse/HBASE-1823
             Project: Hadoop HBase
          Issue Type: New Feature
          Components: client, regionserver
    Affects Versions: 0.20.0
            Reporter: Jonathan Gray
            Assignee: Jonathan Gray
             Fix For: 0.20.1, 0.21.0


There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.

The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.

Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754291#action_12754291 ] 

Jonathan Gray commented on HBASE-1823:
--------------------------------------

Yes, somehow that didn't make it in the posted patch but it's in the code running on my cluster.  Thanks for catching that stack.  Will post new patch.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1823:
---------------------------------

    Attachment: HBASE-1823-v2.patch

Had partial comment in first patch.  Cleans up the line in Scan class comment, notes as Expert.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754279#action_12754279 ] 

stack commented on HBASE-1823:
------------------------------

Does the new flag cacheBlocks need to be added to the Writable serialization?

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754364#action_12754364 ] 

Jonathan Gray commented on HBASE-1823:
--------------------------------------

Hmm... now when running with addition to the Writable, I get this exception:

{noformat}
2009-09-11 13:14:03,903 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
java.io.IOException: Call to /192.168.249.102:61020 failed on local exception: java.io.EOFException
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:757)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:727)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
        at $Proxy1.openScanner(Unknown Source)
        at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160)
        at org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
        at org.apache.hadoop.hbase.master.RootScanner.initialScan(RootScanner.java:73)
        at org.apache.hadoop.hbase.master.BaseScanner.initialChore(BaseScanner.java:131)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:505)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:449)
{noformat}

The code seems right... what am I missing here?

{noformat}
@@ -518,6 +545,7 @@
     this.stopRow = Bytes.readByteArray(in);
     this.maxVersions = in.readInt();
     this.caching = in.readInt();
+    this.cacheBlocks = in.readBoolean();
     if(in.readBoolean()) {
       this.filter = (Filter)createForName(Bytes.toString(Bytes.readByteArray(in)));
       this.filter.readFields(in);
@@ -550,6 +578,7 @@
     Bytes.writeByteArray(out, this.stopRow);
     out.writeInt(this.maxVersions);
     out.writeInt(this.caching);
+    out.writeBoolean(this.cacheBlocks);
     if(this.filter == null) {
       out.writeBoolean(false);
     } else {
{noformat}

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch, HBASE-1823-v3.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1823:
---------------------------------

    Status: Patch Available  (was: Open)

Going to test more, but please review.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1823:
---------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Tested on my cluster, works as expected.  Committed to branch and trunk.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch, HBASE-1823-v3.patch, HBASE-1823-v4.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753969#action_12753969 ] 

Jonathan Gray commented on HBASE-1823:
--------------------------------------

Thanks for review, Andrew.

Opened HBASE-1827 to add option to shell.  Don't want to hold up commit of this.

Seems to work in preliminary testing.  Going to do more testing tomorrow and then will commit.


> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754376#action_12754376 ] 

stack commented on HBASE-1823:
------------------------------

After back and forth jgray on irc, seems like he had wrong jar in way.

I reviewed patch. +1 on commit.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch, HBASE-1823-v3.patch, HBASE-1823-v4.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753936#action_12753936 ] 

Andrew Purtell commented on HBASE-1823:
---------------------------------------

+1

Shell support? An option to 'scan'? 

Let's get this into 0.20.1. I have a test scenario where region servers under high write stress will OOME if scanned.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1823:
---------------------------------

    Attachment: HBASE-1823-v1.patch

Pretty simple change.  Adds a boolean flag, cacheBlocks, to Scan with a getter and setter.  That boolean is passed into the existing HFile.Reader.getScanner(boolean cacheBlocks) that was created to allow this functionality during compactions.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1823:
---------------------------------

    Attachment: HBASE-1823-v4.patch

Was just bad syncing on my cluster, apparently.

New patch just adds serialization verification to the writable unit test.  Will do a couple more cluster tests then commit.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch, HBASE-1823-v3.patch, HBASE-1823-v4.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1823) Ability for Scanners to bypass the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1823:
---------------------------------

    Attachment: HBASE-1823-v3.patch

Adds Scan Writable.

> Ability for Scanners to bypass the block cache
> ----------------------------------------------
>
>                 Key: HBASE-1823
>                 URL: https://issues.apache.org/jira/browse/HBASE-1823
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1823-v1.patch, HBASE-1823-v2.patch, HBASE-1823-v3.patch
>
>
> There are a number of use cases where exposing the ability to not cache blocks during a scan would be valuable.  For example, running row counts.
> The LRU is scan-resistant, so it does provide some protection already, but even in that case all you prevent is the eviction of hot blocks.  The LRU still runs many evictions and the blocks are referenced for much longer periods of time, so this adds enormous stress to the GC.
> Compactions already do this.  This issue is about exposing that as a switch to the client-side Scan object (will also enable it for MR jobs then).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.