You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by ec...@apache.org on 2014/03/08 01:10:06 UTC

git commit: ACCUMULO-2441 outline the file prefix conventions

Repository: accumulo
Updated Branches:
  refs/heads/1.6.0-SNAPSHOT 4fabfbaa1 -> 0297276e6


ACCUMULO-2441 outline the file prefix conventions


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/0297276e
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/0297276e
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/0297276e

Branch: refs/heads/1.6.0-SNAPSHOT
Commit: 0297276e692d117cd515ec31d1ca1412570e4785
Parents: 4fabfba
Author: Eric Newton <er...@gmail.com>
Authored: Fri Mar 7 19:05:56 2014 -0500
Committer: Eric Newton <er...@gmail.com>
Committed: Fri Mar 7 19:05:56 2014 -0500

----------------------------------------------------------------------
 .../chapters/troubleshooting.tex                | 29 ++++++++++++++++++++
 1 file changed, 29 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/0297276e/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex b/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex
index 8ba7176..18d472f 100644
--- a/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex
+++ b/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex
@@ -599,3 +599,32 @@ but the basic approach is:
  \item Recreate tables, users and permissions
  \item Import the directories under \texttt{/corrupt/tables/<id>} into the new instance
 \end{itemize}
+
+\section{File Naming Conventions}
+
+Q. Why are files named like they are? Why do some start with ``C'' and others with ``F''?
+
+A. The file names give you a basic idea for the source of the file.
+
+The base of the filename is a base-36 unique number. All filenames in accumulo are coordinated 
+with a counter in zookeeper, so they are always unique, which is useful for debugging.
+
+The leading letter gives you an idea of how the file was created:
+
+\begin{itemize}
+ \item F - Flush: entries in memory were written to a file (Minor Compaction)
+ \item M - Merging compaction: entries in memory were combined with the smallest file to create one new file
+ \item C - Several files, but not all files, were combined to produce this file (Major Compaction)
+ \item A - All files were compacted, delete entries were dropped
+ \item I - Bulk import, complete, sorted index files. Always in a directory starting with "b-"
+\end{itemize}
+
+This simple file naming convention allows you to see the basic structure of the files from just 
+their filenames, and reason about what should be happening to them next, just
+by scanning their entries in the metadata tables.
+
+For example, if you see multiple files with ``M'' prefixes, the tablet is, or was, up against it's
+maximum file limit, so it began merging memory updates with files to keep the file count reasonable.  This
+slows down ingest performance, so knowing there are many files like this tells you that the system
+is struggling to keep up with ingest vs the compaction strategy which reduces the number of files.
+