You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2013/06/20 15:31:28 UTC

svn commit: r1494983 - /accumulo/site/trunk/content/notable_features.mdtext

Author: kturner
Date: Thu Jun 20 13:31:28 2013
New Revision: 1494983

URL: http://svn.apache.org/r1494983
Log:
updated w/ some info about 1.5 improvements

Modified:
    accumulo/site/trunk/content/notable_features.mdtext

Modified: accumulo/site/trunk/content/notable_features.mdtext
URL: http://svn.apache.org/viewvc/accumulo/site/trunk/content/notable_features.mdtext?rev=1494983&r1=1494982&r2=1494983&view=diff
==============================================================================
--- accumulo/site/trunk/content/notable_features.mdtext (original)
+++ accumulo/site/trunk/content/notable_features.mdtext Thu Jun 20 13:31:28 2013
@@ -136,7 +136,8 @@ Scans will not see data inserted into a 
 If consecutive keys have identical portions (row, colf, colq, or colvis), there
 is a flag to indicate that a portion is the same as that of the previous key.
 This is applied when keys are stored on disk and when transferred over the
-network.
+network.  Starting with 1.5, prefix erasure is supported.  When its cost 
+effective, prefixes repeated in subsequent key fields are not repeated.
 
 ### Native In-Memory Map
 
@@ -170,6 +171,16 @@ written. When an index block exceeds the
 written out between data blocks. The size of index blocks is configurable on a
 per table basis.
 
+### Binary search in RFile blocks (1.5)
+
+RFile uses its index to locate a block of key values.  Once it reaches a block 
+it performs a linear scan to find a key on interest.  Starting with 1.5, Accumulo
+will generate indexes of cached blocks in an adaptive manner.  Accumulo indexes 
+the blocks that are read most frequently.  When a block is read a few times, a 
+small index is generated.  As a block is read more, larger indexes are generated 
+making future seeks faster. This strategy allows Accumulo to dynamically respond 
+to read patterns without precomputing block indexes when RFiles are written.
+
 ## Testing <a id="testing"></a>
 
 ### Mock
@@ -177,6 +188,13 @@ per table basis.
 The Accumulo client API has a mock implementation that is useful writing unit
 test against Accumulo. Mock Accumulo is in memory and in process.
 
+### Mini Accumulo Cluster (1.5 & 1.4.4)
+
+Mini Accumulo cluster is a set of utility code that makes it easy to spin up 
+a local Accumulo instance running against the local filesystem.  Mini Accumulo
+is slower than Mock Accumulo, but its behavior is mirrors a real Accumulo 
+instance more closely.  
+
 ### Functional Test
 
 Small, system-level tests of basic Accumulo features run in a test harness,
@@ -236,6 +254,13 @@ could be different from the Accumulo nod
 
 Accumulo can be a source and/or sink for map reduce jobs.
 
+### Thrift Proxy (1.5 & 1.4.4)
+
+The Accumulo client code contains a lot of complexity.  For example, the 
+client code locates tablets, retries in the case of failures, and supports 
+concurrent reading and writing.  All of this is written in Java.  The thrift
+proxy wraps the Accumulo client API with thrift, making this API easily
+available to other languages like Python, Ruby, C++, etc.
 
 ## Extensible Behaviors <a id="behaviors"></a>
 
@@ -327,6 +352,12 @@ was growing.  Without this feature, inge
 constant rate, even as scan performance decreases because tablets have too many
 files.
 
+### Loading jars using VFS (1.5)
+
+User written iterators are a useful way to manipulate data in data in Accumulo.  
+Before 1.5., users had to copy their iterators to each tablet server.  Starting 
+with 1.5 Accumulo can load iterators from HDFS using Apache commons VFS.
+
 ## On-demand Data Management <a id="ondemand_dm"></a>
 
 ### Compactions
@@ -335,7 +366,8 @@ Ability to force tablets to compact to o
 compacted.  This is useful for improving query performance, permanently
 applying iterators, or using a new locality group configuration.  One example
 of using iterators is applying a filtering iterator to remove data from a
-table. 
+table. As of 1.5, users can initiate a compaction with iterators only applied to 
+that compaction event.
 
 ### Split points
 
@@ -356,6 +388,11 @@ mutated independently. Testing was the m
 feature. For example to test a new filtering iterator, clone the table, add the
 filter to the clone, and force a major compaction.
 
+### Import/Export Table (1.5)
+
+An offline tables metadata and files can easily be copied to another cluster and 
+imported.
+
 ### Compact Range (1.4)
 
 Compact each tablet that falls within a row range down to a single file.