You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2011/11/04 17:32:58 UTC
svn commit: r1197634 -
/incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex
Author: kturner
Date: Fri Nov 4 16:32:57 2011
New Revision: 1197634
URL: http://svn.apache.org/viewvc?rev=1197634&view=rev
Log:
ACCUMULO-68 Documented compactions, compaction ratio, and merging minor compactions in user manual.
Modified:
incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex
Modified: incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex
URL: http://svn.apache.org/viewvc/incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex?rev=1197634&r1=1197633&r2=1197634&view=diff
==============================================================================
--- incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex (original)
+++ incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex Fri Nov 4 16:32:57 2011
@@ -378,6 +378,93 @@ table.cache.index.enable: Determines whe
The block cache can have a significant effect on alleviating hot spots, as well as reducing query latency.
It is enabled by default for the !METADATA table.
+\section{Compaction}
+
+As data is written to Accumulo it is buffered in memory. The data buffered in
+memory is eventually written to HDFS on a per tablet basis. Files can also be
+added to tablets directly by bulk import. In the background tablet servers run
+major compactions to merge multiple files into one. The tablet server has to
+decide which tablets to compact and which files within a tablet to compact.
+This decision is made using the compaction ratio, which is configurable on a
+per table basis. To configure this ratio modify the following property.
+
+\begin{verbatim}
+table.compaction.major.ratio
+\end{verbatim}
+
+Increasing this ratio will result in more files per tablet and less compaction
+work. More files per tablet means more higher query latency. So adjusting
+this ratio is a trade off between ingest and query performance. The ratio
+defaults to 3.
+
+The way the ratio works is that files in a tablet are compacted if all files
+are larger than the ratio multiplied by the largest file size. If this cannot
+be done with all the files, the largest file is removed from consideration, and
+the remaining files are considered for compaction. This is done until there are
+no files to consider.
+
+The number of background threads tablet servers use to run major compactions is
+configurable. To configure this modify the following property
+
+\begin{verbatim}
+tserver.compaction.major.concurrent.max
+\end{verbatim}
+
+Also, the number of threads tablet servers use for minor compactions is
+configurable. To configure this modify the following property
+
+\begin{verbatim}
+tserver.compaction.minor.concurrent.max
+\end{verbatim}
+
+The numbers of minor and major compactions running and queued is visible on the
+Accumulo monitor page. This allows you to see if compactions are backing up
+and adjustments to the above settings are needed. When adjusting the number of
+threads available for compactions consider the number of cores and other task
+running on the nodes like map reduce.
+
+If major compactions are not keeping up, then the number of file per tablet
+will grow to a point where query performance starts to suffer. One way to
+handle this situation is to increase the compaction ratio. For example, if the
+compaction ratio was set to 1, then every new file that was added to a tablet
+by minor compaction would immediately queue the tablet for major compaction.
+So if a tablet had a 200M file and minor compaction wrote a 1M file, then the
+major compaction would attempt to merge the 200M and 1M file. If the tablet
+server has lots of tablets trying to do this sort of thing, then major
+compactions will backup and the number of files per tablet will start to grow.
+This is assuming data is being continuously written. So increasing the
+compaction ratio will lower the amount of major compaction work that needs to
+be done and alleviate backups.
+
+Another option to deal with the files per tablet growing too large is adjust
+the following property.
+
+\begin{verbatim}
+table.file.max
+\end{verbatim}
+
+When a tablet reaches this number of files it will choose to do a merging minor
+compaction. A merging minor compaction will merge the tablets smallest file
+with the data in memory at minor compaction time. Therefore the number of
+files will not grow beyond this limit. This will make minor compactions take
+longer, which will cause ingest performance to decrease. This can cause ingest
+to slow down until major compactions have enough time to catch up. When
+adjusting this property, also consider adjusting the compaction ratio.
+Ideally, merging minor compactions never need to occur and major compactions
+can keep up.It is possible to configure the file max and compaction ratio such
+that only merging minor compaction occur and major compactions never occur.
+This is bad because merging minor compactions do $O(N^2)$ work where the amount
+of work done by major compactions is $O(N*\log_R(N))$ where R is the ratio.
+
+
+Compactions can be manually initiated for a table. To initiate a minor
+compaction, use the flush command in the shell. To initiate a major compaction,
+use the compact command in the shell. The compact command will compact all
+tablets in a table to one file. Even tablets with one file are compacted this
+is useful for the case where a filter was added to a table. In 1.4 the ability
+to compact a range of a table was added. To use this feature specify start and
+stop rows for the compact command. This will only compact tablets that overlap
+the given row range.
\section{Pre-splitting tables}