You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/07/20 23:53:06 UTC
[jira] Created: (HADOOP-1646) [hbase] RegionServer OOME's under
sustained, substantial loading by 10 concurrent clients
[hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
-----------------------------------------------------------------------------------------
Key: HADOOP-1646
URL: https://issues.apache.org/jira/browse/HADOOP-1646
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Minor
Have been running ten concurrent clients uploading wikipedia to hbase. Each update includes some metadata -- URL, mimetype -- and the page content. Caching updates across compactions and splits, we OOME (Default heap size of 1G). 10 concurrent clients are doing over 10k rows a minute. HBase should be able to carry this common loading scenario.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1646) [hbase] RegionServer OOME's under
sustained, substantial loading by 10 concurrent clients
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515874 ]
Hadoop QA commented on HADOOP-1646:
-----------------------------------
+1
http://issues.apache.org/jira/secure/attachment/12362639/oome.patch applied and successfully tested against trunk revision r559886.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/475/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/475/console
> [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-1646
> URL: https://issues.apache.org/jira/browse/HADOOP-1646
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase. Each update includes some metadata -- URL, mimetype -- and the page content. Caching updates across compactions and splits, we OOME (Default heap size of 1G). 10 concurrent clients are doing over 10k rows a minute. HBase should be able to carry this common loading scenario.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1646) [hbase] RegionServer OOME's under
sustained, substantial loading by 10 concurrent clients
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-1646:
--------------------------
Attachment: oome.patch
Here's a patch
HADOOP-1646 RegionServer OOME's under sustained, substantial loading by 10
concurrent clients
Added a gate that closes when overwhelmed by load. Tuned default configuration
to better suit sustained loading. Compactions and splits are taking too long,
so long, its not hard to put a region server into a state where it mostly
has clients on hold while it splits and compacts (To be addressed next).
M src/contrib/hbase/conf/hbase-default.xml
Edit of property descriptions. HMemCache thresholds are now done in
byte sizes rather than number of commits.
(hbase.regionserver.msginterval) changed from 15 to 10 seconds.
(hbase.hregion.maxunflushed) Removed. Replaced by
hbase.hregion.memcache.flush.size.
(hbase.hregion.compactionThreshold,
hbase.hregion.memcache.block.multiplier,
hbase.regionserver.thread.splitcompactcheckfrequency): Added.
(hbase.hregion.max.filesize): Changed from 128M to 64M.
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHMemcache.java
Removed setting of fs.file.impl. No longer neeeded.
Added assertion that history is being cleaned up.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java
(LOG): Added for debug level logging.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
LOGging edit adding size, count and names of store files.
(storeName): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreKey.java
Made all constructors go via the constructor that takes all args.
(getSize): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
LOGging edits adding sizes, time-to-complete, etc. Made it so could
run a split even though no compaction if files on disk were big enough.
We were running adding/deleting of regions from META numRetries times
every time. Halfed default for split/compact checker thread run time.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java
(DEFAULT_MAX_FILE_SIZE): Changed from 128M to 64M.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMemcache.java
Removed logging (Moved to hosting HRegionServer).
(getSize): Added.
M src/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
Removed 'closed' from WriteState and moved it out to HRegion.
Added waiting on outstanding row locks before splitting.
Added logging of how long splits and compactions take as well sizes of
store files and region. Added forced flush if more than 10 optionals
w/o our flushing to write out ROOT and META data, usually too small
to earn a size-triggered flush. Added a checkResources that will block
clients updating if we've exceeded memcache upper-size bound.
(closed, noFlushCount, blockingMemcacheSize): Added.
(maxUnflushedEntries): Removed. Replaced by memcacheFlushSize.
(splitStoreFile): Added (Refactored duplicated code here).
(getAllStoreFiles): Added.
(startUpdate): Added read lock around get of row lock. Added
check to see if we should block.
(checkResources): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HLocking.java
Formatting.
> [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-1646
> URL: https://issues.apache.org/jira/browse/HADOOP-1646
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase. Each update includes some metadata -- URL, mimetype -- and the page content. Caching updates across compactions and splits, we OOME (Default heap size of 1G). 10 concurrent clients are doing over 10k rows a minute. HBase should be able to carry this common loading scenario.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1646) [hbase] RegionServer OOME's under
sustained, substantial loading by 10 concurrent clients
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-1646:
--------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Committed. Resolving.
> [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-1646
> URL: https://issues.apache.org/jira/browse/HADOOP-1646
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase. Each update includes some metadata -- URL, mimetype -- and the page content. Caching updates across compactions and splits, we OOME (Default heap size of 1G). 10 concurrent clients are doing over 10k rows a minute. HBase should be able to carry this common loading scenario.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1646) [hbase] RegionServer OOME's under
sustained, substantial loading by 10 concurrent clients
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-1646:
--------------------------
Status: Patch Available (was: Open)
Builds and passes all tests locally.
> [hbase] RegionServer OOME's under sustained, substantial loading by 10 concurrent clients
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-1646
> URL: https://issues.apache.org/jira/browse/HADOOP-1646
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase. Each update includes some metadata -- URL, mimetype -- and the page content. Caching updates across compactions and splits, we OOME (Default heap size of 1G). 10 concurrent clients are doing over 10k rows a minute. HBase should be able to carry this common loading scenario.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.