You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Vladimir Rodionov <vl...@gmail.com> on 2015/06/10 05:54:28 UTC

Potential issue with ExploringCompactionPolicy?

Hi, folks

This is from HBase book:
hbase.hstore.compaction.min.size
Description

A StoreFile smaller than this size will always be eligible for minor
compaction. HFiles this size or larger are evaluated by
hbase.hstore.compaction.ratio to determine if they are eligible. Because
this limit represents the "automatic include"limit for all StoreFiles
smaller than this value, this value may need to be reduced in write-heavy
environments where many StoreFiles in the 1-2 MB range are being flushed,
because every StoreFile will be targeted for compaction and the resulting
StoreFiles may still be under the minimum size and require further
compaction. If this parameter is lowered, the ratio check is triggered more
quickly. This addressed some issues seen in earlier versions of HBase but
changing this parameter is no longer necessary in most situations. Default:
128 MB expressed in bytes.
 Default

134217728
  hbase.hstore.compaction.max.size
Description

A StoreFile larger than this size will be excluded from compaction. The
effect of raising hbase.hstore.compaction.max.size is fewer, larger
StoreFiles that do not get compacted often. If you feel that compaction is
happening too often without much benefit, you can try raising this value.
Default: the value of LONG.MAX_VALUE, expressed in bytes.
 Default

9223372036854775807
 This is  applyCompactionPolicy (master branch):

 public List<StoreFile> applyCompactionPolicy(final List<StoreFile>
candidates,
       boolean mightBeStuck, boolean mayUseOffPeak, int minFiles, int
maxFiles) {

    final double currentRatio = mayUseOffPeak
        ? comConf.getCompactionRatioOffPeak() :
comConf.getCompactionRatio();

    // Start off choosing nothing.
    List<StoreFile> bestSelection = new ArrayList<StoreFile>(0);
    List<StoreFile> smallest = mightBeStuck ? new ArrayList<StoreFile>(0) :
null;
    long bestSize = 0;
    long smallestSize = Long.MAX_VALUE;

    int opts = 0, optsInRatio = 0, bestStart = -1; // for debug logging
    // Consider every starting place.
    for (int start = 0; start < candidates.size(); start++) {
      // Consider every different sub list permutation in between start and
end with min files.
      for (int currentEnd = start + minFiles - 1;
          currentEnd < candidates.size(); currentEnd++) {
        List<StoreFile> potentialMatchFiles = candidates.subList(start,
currentEnd + 1);

        // Sanity checks
        if (potentialMatchFiles.size() < minFiles) {
          continue;
        }
        if (potentialMatchFiles.size() > maxFiles) {
          continue;
        }

        // Compute the total size of files that will
        // have to be read if this set of files is compacted.
        long size = getTotalStoreSize(potentialMatchFiles);

        // Store the smallest set of files.  This stored set of files will
be used
        // if it looks like the algorithm is stuck.
        if (mightBeStuck && size < smallestSize) {
          smallest = potentialMatchFiles;
          smallestSize = size;
        }

// BEGIN

        if (size > comConf.getMaxCompactSize()) {
          continue;
        }

        ++opts;
        if (size >= comConf.getMinCompactSize()
            && !filesInRatio(potentialMatchFiles, currentRatio)) {
          continue;
        }
// END
        ++optsInRatio;
        if (isBetterSelection(bestSelection, bestSize, potentialMatchFiles,
size, mightBeStuck)) {
          bestSelection = potentialMatchFiles;
          bestSize = size;
          bestStart = start;
        }
      }
    }
    if (bestSelection.size() == 0 && mightBeStuck) {
      LOG.debug("Exploring compaction algorithm has selected " +
smallest.size()
          + " files of size "+ smallestSize + " because the store might be
stuck");
      return new ArrayList<StoreFile>(smallest);
    }
    LOG.debug("Exploring compaction algorithm has selected " +
bestSelection.size()
        + " files of size " + bestSize + " starting at candidate #" +
bestStart +
        " after considering " + opts + " permutations with " + optsInRatio
+ " in ratio");
    return new ArrayList<StoreFile>(bestSelection);
  }


The code in question is between  // BEGIN //END
Why do we compare size of LIST of stores with configuration values which
are supposed to be per one store file?