You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/01/15 07:05:59 UTC

[jira] Created: (HBASE-1127) OOME running randomRead PE

OOME running randomRead PE
--------------------------

                 Key: HBASE-1127
                 URL: https://issues.apache.org/jira/browse/HBASE-1127
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack
            Priority: Blocker
             Fix For: 0.19.0


Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1127) OOME running randomRead PE

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664581#action_12664581 ] 

stack commented on HBASE-1127:
------------------------------

Thanks for the votes lads.  I'm with you now after my testing.  Either the GC is a laggard when it comes to adding released references to the referencequeue -- probably not -- or the window is narrow and we're just not servicing the queue promptly enough.  Looking at code, i'm not sure how we can be reactive enough, not without paying high cost in monitoring code.

Time to start up a smart cache effort.  Erik Holstad has made a start already I believer.  Erik Holstad and Jon Gray experiments with Soft References had it that it wasn't LRU any way (no guarantees but suggested eviction praxis in javadoc).

Meantime, undoing blockcache on by default in all but catalog tables.  Will be back after a bit of testing.

> OOME running randomRead PE
> --------------------------
>
>                 Key: HBASE-1127
>                 URL: https://issues.apache.org/jira/browse/HBASE-1127
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1127) OOME running randomRead PE

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664357#action_12664357 ] 

Jean-Daniel Cryans commented on HBASE-1127:
-------------------------------------------

bq. Given that we're up against the RC, I am currently thinking that I'll revert to having blockcache on by default and instead let users choose it explicitly (with the checker running every second). I'll leave it on in catalog tables so meta content has block cache on.

+1

> OOME running randomRead PE
> --------------------------
>
>                 Key: HBASE-1127
>                 URL: https://issues.apache.org/jira/browse/HBASE-1127
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1127) OOME running randomRead PE

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664343#action_12664343 ] 

stack commented on HBASE-1127:
------------------------------

Been looking at this more.  I can watch the GC doing Full, Full, Full, but our executor thread checking SoftValueMap reference queues is not clearing anything.  Then we OOME.  I tried various things including a thread per instance of BlockFSInputStream just blocked on the reference queue waiting for the GC to add stuff.  Odd is that even in this case, we OOME though we get a bit further.  Changing the interval between when our executor thread runs from 10 seconds to 1 second makes it so the executor now does clearing of reference queues but again its not enough.  We'll OOME at about same place as we do when we have a thread per BlockFSInputStream instance (A thread per instance won't fly so this is good)

I'm going to look at this a little more.  In times of high memory pressure, its as though the GC gives up adding items to reference queues which wouldn't seem to make sense.  Given that we're up against the RC, I am currently thinking that I'll revert to having blockcache on by default and instead let users choose it explicitly (with the checker running every second).  I'll leave it on in catalog tables so meta content has block cache on.



> OOME running randomRead PE
> --------------------------
>
>                 Key: HBASE-1127
>                 URL: https://issues.apache.org/jira/browse/HBASE-1127
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1127) OOME running randomRead PE

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664429#action_12664429 ] 

Andrew Purtell commented on HBASE-1127:
---------------------------------------

+1

IMHO blockcache should be off by default, except for meta, where HBase can clearly establish a benefit. Otherwise, it's up to the application to decide if sufficient benefit is derived to offset the risk of the added heap pressure. Many applications of the archival type will not benefit from blockcache on their tables, for example.


> OOME running randomRead PE
> --------------------------
>
>                 Key: HBASE-1127
>                 URL: https://issues.apache.org/jira/browse/HBASE-1127
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1127) OOME running randomRead PE (Disable blanket enabling of blockcache)

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1127:
-------------------------

    Summary: OOME running randomRead PE (Disable blanket enabling of blockcache)  (was: OOME running randomRead PE)

> OOME running randomRead PE (Disable blanket enabling of blockcache)
> -------------------------------------------------------------------
>
>                 Key: HBASE-1127
>                 URL: https://issues.apache.org/jira/browse/HBASE-1127
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 1127-v2.patch
>
>
> Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1127) OOME running randomRead PE

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1127:
-------------------------

    Attachment: 1127-v2.patch

Patch to turn off block cache on by default.

> OOME running randomRead PE
> --------------------------
>
>                 Key: HBASE-1127
>                 URL: https://issues.apache.org/jira/browse/HBASE-1127
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 1127-v2.patch
>
>
> Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1127) OOME running randomRead PE (Disable blanket enabling of blockcache)

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-1127:
----------------------------

    Assignee: stack

> OOME running randomRead PE (Disable blanket enabling of blockcache)
> -------------------------------------------------------------------
>
>                 Key: HBASE-1127
>                 URL: https://issues.apache.org/jira/browse/HBASE-1127
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 1127-v2.patch
>
>
> Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1127) OOME running randomRead PE (Disable blanket enabling of blockcache)

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1127.
--------------------------

    Resolution: Fixed

> OOME running randomRead PE (Disable blanket enabling of blockcache)
> -------------------------------------------------------------------
>
>                 Key: HBASE-1127
>                 URL: https://issues.apache.org/jira/browse/HBASE-1127
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 1127-v2.patch
>
>
> Blockcache is misbehaving on TRUNK.  Something is broke.  We OOME about 20% into the randomRead test.  Looking at heap, its all soft references.  Instrumenting the referencequeue, we're never clearing full gc'ing.  Something is off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.