You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/07/20 19:59:06 UTC
[jira] Created: (HADOOP-1644) [hbase] Compactions should take no
longer than period between memcache flushes
[hbase] Compactions should take no longer than period between memcache flushes
------------------------------------------------------------------------------
Key: HADOOP-1644
URL: https://issues.apache.org/jira/browse/HADOOP-1644
Project: Hadoop
Issue Type: Wish
Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Minor
Currently, compactions take a long time. During compaction, updates are carried by the HRegions' memcache (+ backing HLog). memcache is unable to flush to disk until compaction completes.
Under sustained, substantial -- rows that contain multiple columns one of which is a web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage scenario, the memcache grows fast and often to orders of magnitude in excess of the configured 'flush-to-disk' threshold.
This throws the whole system out of kilter. When memcache does get to run after compaction completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a region split ..... but the resulting split will produce regions that themselves need to be immediately split because each half is beyond the configured limit, and so on...
In another issue yet to be posted, tuning and some pointed memcache flushes makes the above condition less extreme but until compaction durations come close to the memcache flush threshold compactions will remain disruptive.
Its allowed that compactions may never be fast enough as per bigtable paper (This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1644) [hbase] Compactions should not
block updates
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517588 ]
Jim Kellerman commented on HADOOP-1644:
---------------------------------------
Compactions should never block updates.
What follows is a proposed solution:
1. A compaction is needed. This is the hard part. What conditions should trigger a compaction? Right after a split? Any other time?
2. A new thread is started to do the compaction. Since all the MapFiles (SSTables) are immutable, the HStore can continue to service reads from the existing MapFiles while the compaction thread is creating the new compacted MapFile. The thread gets the list of MapFiles that exist at the time it is started and will only act on those. Cache flushes will create new MapFiles that aren't a part of the compaction.
3. When the compaction is complete, the thread grabs a write lock on the HStore. (it might have to wait a bit if there are some scans going on, but that's ok)
4. When the lock is acquired, the newly created compacted MapFile is put into place and the MapFiles it read from are removed. (very short time period).
5. The lock is then released and the HStore services requests from the newly compacted MapFile and any new MapFiles that may have been created by cache flushes that occurred since the compaction started.
6. The MapFiles that were the input to the compaction can now be deleted.
> [hbase] Compactions should not block updates
> --------------------------------------------
>
> Key: HADOOP-1644
> URL: https://issues.apache.org/jira/browse/HADOOP-1644
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Affects Versions: 0.15.0
> Reporter: stack
> Assignee: stack
> Fix For: 0.15.0
>
>
> Currently, compactions take a long time. During compaction, updates are carried by the HRegions' memcache (+ backing HLog). memcache is unable to flush to disk until compaction completes.
> Under sustained, substantial -- rows that contain multiple columns one of which is a web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage scenario, the memcache grows fast and often to orders of magnitude in excess of the configured 'flush-to-disk' threshold.
> This throws the whole system out of kilter. When memcache does get to run after compaction completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a region split ..... but the resulting split will produce regions that themselves need to be immediately split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes makes the above condition less extreme but until compaction durations come close to the memcache flush threshold compactions will remain disruptive.
> Its allowed that compactions may never be fast enough as per bigtable paper (This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1644) [hbase] Compactions should not block
updates
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-1644:
--------------------------
Attachment: interlacing.patch
Patch to pick up an existing HStoreFIle from disk interlacing the memcache flush compacting as we go. The notion was that interlacing would take as long as a pure flush with the added benefit of having one less file in the store when the flush completes. Experiments show that it takes about 1.5 to three times the amount of time flushing dependent on the number of column families -- more column families makes it so the two timings come closer together -- but at the cost of complicating flush (since we are compounding flushing and compaction). Not going to pursue this experiment further.
> [hbase] Compactions should not block updates
> --------------------------------------------
>
> Key: HADOOP-1644
> URL: https://issues.apache.org/jira/browse/HADOOP-1644
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Affects Versions: 0.15.0
> Reporter: stack
> Assignee: stack
> Fix For: 0.15.0
>
> Attachments: interlacing.patch
>
>
> Currently, compactions take a long time. During compaction, updates are carried by the HRegions' memcache (+ backing HLog). memcache is unable to flush to disk until compaction completes.
> Under sustained, substantial -- rows that contain multiple columns one of which is a web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage scenario, the memcache grows fast and often to orders of magnitude in excess of the configured 'flush-to-disk' threshold.
> This throws the whole system out of kilter. When memcache does get to run after compaction completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a region split ..... but the resulting split will produce regions that themselves need to be immediately split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes makes the above condition less extreme but until compaction durations come close to the memcache flush threshold compactions will remain disruptive.
> Its allowed that compactions may never be fast enough as per bigtable paper (This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1644) [hbase] Compactions should not block
updates
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-1644:
--------------------------
Attachment: non-blocking-compaction-v2.patch
Version 2 fixes bug where master was prematurely cleaning up parent regions.
> [hbase] Compactions should not block updates
> --------------------------------------------
>
> Key: HADOOP-1644
> URL: https://issues.apache.org/jira/browse/HADOOP-1644
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Affects Versions: 0.15.0
> Reporter: stack
> Assignee: stack
> Fix For: 0.15.0
>
> Attachments: interlacing.patch, non-blocking-compaction-v2.patch, non-blocking-compaction.patch
>
>
> Currently, compactions take a long time. During compaction, updates are carried by the HRegions' memcache (+ backing HLog). memcache is unable to flush to disk until compaction completes.
> Under sustained, substantial -- rows that contain multiple columns one of which is a web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage scenario, the memcache grows fast and often to orders of magnitude in excess of the configured 'flush-to-disk' threshold.
> This throws the whole system out of kilter. When memcache does get to run after compaction completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a region split ..... but the resulting split will produce regions that themselves need to be immediately split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes makes the above condition less extreme but until compaction durations come close to the memcache flush threshold compactions will remain disruptive.
> Its allowed that compactions may never be fast enough as per bigtable paper (This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1644) [hbase] Compactions should not block
updates
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-1644:
--------------------------
Attachment: non-blocking-compaction.patch
Here is a patch that reworks compactions.
HADOOP-1644 [hbase] Compactions should not block updates
Disentangles flushes and compactions; flushes can proceed while a
compaction is happening. Also, don't compact unless we hit
compaction threshold: i.e. don't automatically compact on HRegion
startup so regions can come online the faster.
M src/contrib/hbase/conf/hbase-default.xml
(hbase.hregion.compactionThreashold): Moved to be a hstore property
as part of encapsulating compaction decision inside hstore.
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/HBaseTestCase.java
Refactored. Moved here generalized content loading code that can
be shared by tests. Add to setup and teardown the setup and removal
of local test dir (if it exists).
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestCompare.java
Added test of HStoreKey compare (It works other than one would at
first expect).
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestSplit.java
Bulk of content loading code has been moved up into the parent class.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConnectionManager.java
(tableExists): Restore to a check of if the asked-for table is in list of
tables. As it was, a check for tableExists would just wait on all timeouts
and retries to expire and then report table does not exist.. Fixed up
debug message listing regions of a table. Added protection against meta
table not having a COL_REGINFO (Seen in cluster testing -- probably a bug
in row removal).
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java
Loading store files, even if it was noticed that there was no corresponding
map file, was still counting file as valid. Also fix merger -- was
constructing MapFile.Reader directly rather than asking HStoreFile for
the reader (HStoreFile knows how to do MapFile references)
(rename): Added check that move succeeded and logging. In cluster-testing,
the hdfs move of compacted file into place has failed on occasion (Need
more info).
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
Encapsulate ruling on whether a compaction should take place inside HStore.
Added reading of the compactionThreshold her. Compaction threshold is
currently just number of store files. Later may include other factors such
as count of reference files. Cleaned up debug messages around
reconstruction log. Removed compaction if size > 1 from constructor. Let
compaction happen after we've been deployed (Compactions that happen while
we are online can continue to take updates. Compaction in the constructor
puts off our being able to take in updates).
(close): Changed so it now returns set of store files. This used to be done
by calls to flush. Since flush and compaction have been disentangled, a
compaction can come in after flush and the list of files could be off.
Having it done by close, can be sure list of files is complete.
(flushCache): No longer returns set of store files. Added 'merging compaction'
where we pick an arbitrary store file from disk and merge into it the content
of memcache (Needs work).
(getAllMapFiles): Renamed getAllStoreFiles.
(needsCompaction): Added.
(compactHelper): Added passing of maximum sequence number if already
calculated. If compacting one file only, we used skip without rewriting
the info file. Fixed.
Refactored. Moved guts to new compact(outFile, listOfStores) method.
(compact, CompactionReader): Added overrides and interface to support
'merging compaction' that takes files and memcache. In compaction,
if we failed the move of the compacted file, all data had already been
deleted. Changing, so deletion happens after confirmed move of
compacted file.
(getFull): Fixed bug where NPE when read of maps came back null.
Revealed by our NOT compacting stores on startup. Meant could be two
backing stores one of which had no data regards queried key.
(getNMaps): Renamed countOfStoreFiles.
(toString): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreKey.java
Added comment on 'odd'-looking comparison.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
Javadoc edit.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HLogEdit.java
Only return first 128 bytes of value when toStringing (On cluster,
was returning complete web pages in log).
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
Removed confusing debug message (made sense once -- but not now).
Test rootRegionLocation for null before using it (can be null).
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMemcache.java
Added comment that delete behavior needs study.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
Fixed merge so it doesn't do the incremental based off files
returned by flush. Instead all is done in the one go after
region closes (using files returned by close).
Moved duplicated code to new filesByFamily method.
(WriteState): Removed writesOngoing in favor of compacting and
flushing flags.
(flushCache): No longer returns list of files.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/util/Writables.java
Fix javadoc.
> [hbase] Compactions should not block updates
> --------------------------------------------
>
> Key: HADOOP-1644
> URL: https://issues.apache.org/jira/browse/HADOOP-1644
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Affects Versions: 0.15.0
> Reporter: stack
> Assignee: stack
> Fix For: 0.15.0
>
> Attachments: interlacing.patch, non-blocking-compaction.patch
>
>
> Currently, compactions take a long time. During compaction, updates are carried by the HRegions' memcache (+ backing HLog). memcache is unable to flush to disk until compaction completes.
> Under sustained, substantial -- rows that contain multiple columns one of which is a web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage scenario, the memcache grows fast and often to orders of magnitude in excess of the configured 'flush-to-disk' threshold.
> This throws the whole system out of kilter. When memcache does get to run after compaction completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a region split ..... but the resulting split will produce regions that themselves need to be immediately split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes makes the above condition less extreme but until compaction durations come close to the memcache flush threshold compactions will remain disruptive.
> Its allowed that compactions may never be fast enough as per bigtable paper (This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-1644) [hbase] Compactions should not block
updates
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HADOOP-1644.
---------------------------
Resolution: Fixed
Committed after testing on three different machines -- macosx and two linux machines one of which was an old K6 single-processor -- and checking no javadoc errors. Resolving.
> [hbase] Compactions should not block updates
> --------------------------------------------
>
> Key: HADOOP-1644
> URL: https://issues.apache.org/jira/browse/HADOOP-1644
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Affects Versions: 0.15.0
> Reporter: stack
> Assignee: stack
> Fix For: 0.15.0
>
> Attachments: interlacing.patch, non-blocking-compaction-v2.patch, non-blocking-compaction.patch
>
>
> Currently, compactions take a long time. During compaction, updates are carried by the HRegions' memcache (+ backing HLog). memcache is unable to flush to disk until compaction completes.
> Under sustained, substantial -- rows that contain multiple columns one of which is a web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage scenario, the memcache grows fast and often to orders of magnitude in excess of the configured 'flush-to-disk' threshold.
> This throws the whole system out of kilter. When memcache does get to run after compaction completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a region split ..... but the resulting split will produce regions that themselves need to be immediately split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes makes the above condition less extreme but until compaction durations come close to the memcache flush threshold compactions will remain disruptive.
> Its allowed that compactions may never be fast enough as per bigtable paper (This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1644) [hbase] Compactions should not
block updates
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518867 ]
stack commented on HADOOP-1644:
-------------------------------
Let me try your suggestion Jim of not having compactions disable flushes.
Another thing I'd like to try is that rather than flushing memory to a new file, instead flush by merging with an existant file. I'm thinking it will take the same amount of elapsed time but we'll have put off a full compaction by not producing an added file.
Another element to consider is that compactions are the means by which HStoreFile references are cleaned up in a region (If references, then a region cannot be split) so compaction should be doing its best to clean up instances of reference files.
> [hbase] Compactions should not block updates
> --------------------------------------------
>
> Key: HADOOP-1644
> URL: https://issues.apache.org/jira/browse/HADOOP-1644
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Affects Versions: 0.15.0
> Reporter: stack
> Assignee: stack
> Fix For: 0.15.0
>
>
> Currently, compactions take a long time. During compaction, updates are carried by the HRegions' memcache (+ backing HLog). memcache is unable to flush to disk until compaction completes.
> Under sustained, substantial -- rows that contain multiple columns one of which is a web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage scenario, the memcache grows fast and often to orders of magnitude in excess of the configured 'flush-to-disk' threshold.
> This throws the whole system out of kilter. When memcache does get to run after compaction completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a region split ..... but the resulting split will produce regions that themselves need to be immediately split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes makes the above condition less extreme but until compaction durations come close to the memcache flush threshold compactions will remain disruptive.
> Its allowed that compactions may never be fast enough as per bigtable paper (This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1644) [hbase] Compactions should not block
updates
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-1644:
----------------------------------
Fix Version/s: 0.15.0
Priority: Major (was: Minor)
Issue Type: Improvement (was: Wish)
Affects Version/s: 0.15.0
Summary: [hbase] Compactions should not block updates (was: [hbase] Compactions should take no longer than period between memcache flushes)
> [hbase] Compactions should not block updates
> --------------------------------------------
>
> Key: HADOOP-1644
> URL: https://issues.apache.org/jira/browse/HADOOP-1644
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Affects Versions: 0.15.0
> Reporter: stack
> Assignee: stack
> Fix For: 0.15.0
>
>
> Currently, compactions take a long time. During compaction, updates are carried by the HRegions' memcache (+ backing HLog). memcache is unable to flush to disk until compaction completes.
> Under sustained, substantial -- rows that contain multiple columns one of which is a web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage scenario, the memcache grows fast and often to orders of magnitude in excess of the configured 'flush-to-disk' threshold.
> This throws the whole system out of kilter. When memcache does get to run after compaction completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a region split ..... but the resulting split will produce regions that themselves need to be immediately split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes makes the above condition less extreme but until compaction durations come close to the memcache flush threshold compactions will remain disruptive.
> Its allowed that compactions may never be fast enough as per bigtable paper (This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.