You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/07/20 23:53:06 UTC

[jira] Created: (HADOOP-1646) [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients

[hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
-----------------------------------------------------------------------------------------

                 Key: HADOOP-1646
                 URL: https://issues.apache.org/jira/browse/HADOOP-1646
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/hbase
            Reporter: stack
            Assignee: stack
            Priority: Minor


Have been running ten concurrent clients uploading wikipedia to hbase.  Each update includes some metadata -- URL, mimetype -- and the page content.  Caching updates across compactions and splits, we OOME (Default heap size of 1G).  10 concurrent clients are doing over 10k rows a minute.  HBase should be able to carry this common loading scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1646) [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515874 ] 

Hadoop QA commented on HADOOP-1646:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12362639/oome.patch applied and successfully tested against trunk revision r559886.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/475/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/475/console

> [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1646
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1646
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>         Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase.  Each update includes some metadata -- URL, mimetype -- and the page content.  Caching updates across compactions and splits, we OOME (Default heap size of 1G).  10 concurrent clients are doing over 10k rows a minute.  HBase should be able to carry this common loading scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1646) [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-1646:
--------------------------

    Attachment: oome.patch

Here's a patch

HADOOP-1646 RegionServer OOME's under sustained, substantial loading by 10
concurrent clients

Added a gate that closes when overwhelmed by load. Tuned default configuration
to better suit sustained loading. Compactions and splits are taking too long,
so long, its not hard to put a region server into a state where it mostly
has clients on hold while it splits and compacts (To be addressed next).

M src/contrib/hbase/conf/hbase-default.xml
  Edit of property descriptions. HMemCache thresholds are now done in
  byte sizes rather than number of commits.
  (hbase.regionserver.msginterval) changed from 15 to 10 seconds.
  (hbase.hregion.maxunflushed) Removed.  Replaced by
  hbase.hregion.memcache.flush.size.
  (hbase.hregion.compactionThreshold,
    hbase.hregion.memcache.block.multiplier,
    hbase.regionserver.thread.splitcompactcheckfrequency): Added.
  (hbase.hregion.max.filesize): Changed from 128M to 64M.
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHMemcache.java
  Removed setting of fs.file.impl.  No longer neeeded.
  Added assertion that history is being cleaned up.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java
  (LOG): Added for debug level logging.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
  LOGging edit adding size, count and names of store files.
  (storeName): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreKey.java
  Made all constructors go via the constructor that takes all args.
  (getSize): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
  LOGging edits adding sizes, time-to-complete, etc. Made it so could
  run a split even though no compaction if files on disk were big enough.
  We were running adding/deleting of regions from META numRetries times
  every time.  Halfed default for split/compact checker thread run time.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java
  (DEFAULT_MAX_FILE_SIZE): Changed from 128M to 64M.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMemcache.java
  Removed logging (Moved to hosting HRegionServer).
  (getSize): Added.
M src/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
  Removed 'closed' from WriteState and moved it out to HRegion.
  Added waiting on outstanding row locks before splitting.
  Added logging of how long splits and compactions take as well sizes of
  store files and region.  Added forced flush if more than 10 optionals
  w/o our flushing to write out ROOT and META data, usually too small
  to earn a size-triggered flush. Added a checkResources that will block
  clients updating if we've exceeded memcache upper-size bound.
  (closed, noFlushCount, blockingMemcacheSize): Added.
  (maxUnflushedEntries): Removed.  Replaced by memcacheFlushSize.
  (splitStoreFile): Added (Refactored duplicated code here).
  (getAllStoreFiles): Added.
  (startUpdate): Added read lock around get of row lock.  Added
  check to see if we should block.
  (checkResources): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HLocking.java
  Formatting.

> [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1646
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1646
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>         Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase.  Each update includes some metadata -- URL, mimetype -- and the page content.  Caching updates across compactions and splits, we OOME (Default heap size of 1G).  10 concurrent clients are doing over 10k rows a minute.  HBase should be able to carry this common loading scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1646) [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-1646:
--------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed. Resolving.

> [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1646
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1646
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>         Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase.  Each update includes some metadata -- URL, mimetype -- and the page content.  Caching updates across compactions and splits, we OOME (Default heap size of 1G).  10 concurrent clients are doing over 10k rows a minute.  HBase should be able to carry this common loading scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1646) [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-1646:
--------------------------

    Status: Patch Available  (was: Open)

Builds and passes all tests locally.

> [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1646
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1646
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>         Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase.  Each update includes some metadata -- URL, mimetype -- and the page content.  Caching updates across compactions and splits, we OOME (Default heap size of 1G).  10 concurrent clients are doing over 10k rows a minute.  HBase should be able to carry this common loading scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.