You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2011/11/04 17:32:58 UTC

svn commit: r1197634 - /incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex

Author: kturner
Date: Fri Nov  4 16:32:57 2011
New Revision: 1197634

URL: http://svn.apache.org/viewvc?rev=1197634&view=rev
Log:
ACCUMULO-68 Documented compactions, compaction ratio, and merging minor compactions in user manual.

Modified:
    incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex

Modified: incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex
URL: http://svn.apache.org/viewvc/incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex?rev=1197634&r1=1197633&r2=1197634&view=diff
==============================================================================
--- incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex (original)
+++ incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex Fri Nov  4 16:32:57 2011
@@ -378,6 +378,93 @@ table.cache.index.enable: Determines whe
 The block cache can have a significant effect on alleviating hot spots, as well as reducing query latency.
 It is enabled by default for the !METADATA table.
 
+\section{Compaction}
+
+As data is written to Accumulo it is buffered in memory.  The data buffered in
+memory is eventually written to HDFS on a per tablet basis.  Files can also be
+added to tablets directly by bulk import.  In the background tablet servers run
+major compactions to merge multiple files into one.  The tablet server has to
+decide which tablets to compact and which files within a tablet to compact.
+This decision is made using the compaction ratio, which is configurable on a
+per table basis.  To configure this ratio modify the following property.
+
+\begin{verbatim}
+table.compaction.major.ratio
+\end{verbatim}  
+
+Increasing this ratio will result in more files per tablet and less compaction
+work.  More files per tablet means more higher query latency.  So adjusting
+this ratio is a trade off between ingest and query performance.  The ratio
+defaults to 3.  
+
+The way the ratio works is that files in a tablet are compacted if all files
+are larger than the ratio multiplied by the largest file size. If this cannot
+be done with all the files, the largest file is removed from consideration, and
+the remaining files are considered for compaction. This is done until there are
+no files to consider.
+
+The number of background threads tablet servers use to run major compactions is
+configurable.  To configure this modify the following property
+
+\begin{verbatim}
+tserver.compaction.major.concurrent.max
+\end{verbatim}
+
+Also, the number of threads tablet servers use for minor compactions is
+configurable.  To configure this modify the following property
+
+\begin{verbatim}
+tserver.compaction.minor.concurrent.max
+\end{verbatim}
+
+The numbers of minor and major compactions running and queued is visible on the
+Accumulo monitor page.  This allows you to see if compactions are backing up
+and adjustments to the above settings are needed.  When adjusting the number of
+threads available for compactions consider the number of cores and other task
+running on the nodes like map reduce.
+
+If major compactions are not keeping up, then the number of file per tablet
+will grow to a point where query performance starts to suffer. One way to
+handle this situation is to increase the compaction ratio.  For example, if the
+compaction ratio was set to 1, then every new file that was added to a tablet
+by minor compaction would immediately queue the tablet for major compaction.
+So if a tablet had a 200M file and minor compaction wrote a 1M file, then the
+major compaction would attempt to merge the 200M and 1M file.  If the tablet
+server has lots of tablets trying to do this sort of thing, then major
+compactions will backup and the number of files per tablet will start to grow.
+This is assuming data is being continuously written.  So increasing the
+compaction ratio will lower the amount of major compaction work that needs to
+be done and alleviate backups.
+
+Another option to deal with the files per tablet growing too large is adjust
+the following property. 
+
+\begin{verbatim}
+table.file.max  
+\end{verbatim}
+
+When a tablet reaches this number of files it will choose to do a merging minor
+compaction.  A merging minor compaction will merge the tablets smallest file
+with the data in memory at minor compaction time.  Therefore the number of
+files will not grow beyond this limit.  This will make minor compactions take
+longer, which will cause ingest performance to decrease.  This can cause ingest
+to slow down until major compactions have enough time to catch up.   When
+adjusting this property, also consider adjusting the compaction ratio.
+Ideally, merging minor compactions never need to occur and major compactions
+can keep up.It is possible to configure the file max and compaction ratio such
+that only merging minor compaction occur and major compactions never occur.
+This is bad because merging minor compactions do $O(N^2)$ work where the amount
+of work done by major compactions is $O(N*\log_R(N))$ where R is the ratio.
+ 
+
+Compactions can be manually initiated for a table.  To initiate a minor
+compaction, use the flush command in the shell.  To initiate a major compaction,
+use the compact command in the shell.  The compact command will compact all
+tablets in a table to one file.  Even tablets with one file are compacted this
+is useful for the case where a filter was added to a table.  In 1.4 the ability
+to compact a range of a table was added.  To use this feature specify start and
+stop rows for the compact command.  This will only compact tablets that overlap
+the given row range. 
 
 \section{Pre-splitting tables}