You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@fluo.apache.org by kt...@apache.org on 2017/06/29 17:30:05 UTC
[incubator-fluo-website] branch gh-pages updated: Recipes 110 docs
(#68)
This is an automated email from the ASF dual-hosted git repository.
kturner pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/incubator-fluo-website.git
The following commit(s) were added to refs/heads/gh-pages by this push:
new 7b7a197 Recipes 110 docs (#68)
7b7a197 is described below
commit 7b7a197a95bc4c64777d98eb13c17a2edde7b614
Author: Keith Turner <ke...@deenlo.com>
AuthorDate: Thu Jun 29 13:30:03 2017 -0400
Recipes 110 docs (#68)
---
_config.yml | 2 +-
.../1.1.0-incubating/accumulo-export-queue.md | 104 +++++++
.../fluo-recipes/1.1.0-incubating/combine-queue.md | 211 ++++++++++++++
docs/fluo-recipes/1.1.0-incubating/export-queue.md | 303 +++++++++++++++++++++
docs/fluo-recipes/1.1.0-incubating/index.md | 103 +++++++
docs/fluo-recipes/1.1.0-incubating/recording-tx.md | 72 +++++
docs/fluo-recipes/1.1.0-incubating/row-hasher.md | 122 +++++++++
.../fluo-recipes/1.1.0-incubating/serialization.md | 76 ++++++
docs/fluo-recipes/1.1.0-incubating/spark.md | 19 ++
.../1.1.0-incubating/table-optimization.md | 65 +++++
docs/fluo-recipes/1.1.0-incubating/testing.md | 14 +
docs/fluo-recipes/1.1.0-incubating/transient.md | 83 ++++++
docs/index.md | 2 +
13 files changed, 1175 insertions(+), 1 deletion(-)
diff --git a/_config.yml b/_config.yml
index ff2dfc3..135393a 100644
--- a/_config.yml
+++ b/_config.yml
@@ -43,7 +43,7 @@ color: default
# Fluo specific settings
latest_fluo_release: "1.1.0-incubating"
-latest_recipes_release: "1.0.0-incubating"
+latest_recipes_release: "1.1.0-incubating"
# Sets links to external API
api_base: "https://javadoc.io/doc/org.apache.fluo"
diff --git a/docs/fluo-recipes/1.1.0-incubating/accumulo-export-queue.md b/docs/fluo-recipes/1.1.0-incubating/accumulo-export-queue.md
new file mode 100644
index 0000000..380daa5
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/accumulo-export-queue.md
@@ -0,0 +1,104 @@
+---
+layout: recipes-doc
+title: Accumulo Export Queue Specialization
+version: 1.1.0-incubating
+---
+## Background
+
+The [Export Queue Recipe][1] provides a generic foundation for building export mechanism to any
+external data store. The [AccumuloExporter] provides an [Exporter] for writing to
+Accumulo. [AccumuloExporter] is located in the `fluo-recipes-accumulo` module and provides the
+following functionality:
+
+ * Safely batches writes to Accumulo made by multiple transactions exporting data.
+ * Stores Accumulo connection information in Fluo configuration, making it accessible by Export
+ Observers running on other nodes.
+ * Provides utility code that make it easier and shorter to code common Accumulo export patterns.
+
+## Example Use
+
+Exporting to Accumulo is easy. Follow the steps below:
+
+1. First, implement [AccumuloTranslator]. Your implementation translates exported
+ objects to Accumulo Mutations. For example, the `SimpleTranslator` class below translates String
+ key/values and into mutations for Accumulo. This step is optional, a lambda could
+ be used in step 3 instead of creating a class.
+
+ ```java
+ public class SimpleTranslator implements AccumuloTranslator<String,String> {
+
+ @Override
+ public void translate(SequencedExport<String, String> export, Consumer<Mutation> consumer) {
+ Mutation m = new Mutation(export.getKey());
+ m.put("cf", "cq", export.getSequence(), export.getValue());
+ consumer.accept(m);
+ }
+ }
+
+ ```
+
+2. Configure an `ExportQueue` and the export table prior to initializing Fluo.
+
+ ```java
+
+ FluoConfiguration fluoConfig = ...;
+
+ String instance = // Name of accumulo instance exporting to
+ String zookeepers = // Zookeepers used by Accumulo instance exporting to
+ String user = // Accumulo username, user that can write to exportTable
+ String password = // Accumulo user password
+ String exportTable = // Name of table to export to
+
+ // Set properties for table to export to in Fluo app configuration.
+ AccumuloExporter.configure(EXPORT_QID).instance(instance, zookeepers)
+ .credentials(user, password).table(exportTable).save(fluoConfig);
+
+ // Set properties for export queue in Fluo app configuration
+ ExportQueue.configure(EXPORT_QID).keyType(String.class).valueType(String.class)
+ .buckets(119).save(fluoConfig);
+
+ // Initialize Fluo using fluoConfig
+ ```
+
+3. In the applications `ObserverProvider`, register an observer that will process exports and write
+ them to Accumulo using [AccumuloExporter]. Also, register observers that add to the export
+ queue.
+
+ ```java
+ public class MyObserverProvider implements ObserverProvider {
+
+ @Override
+ public void provide(Registry obsRegistry, Context ctx) {
+ SimpleConfiguration appCfg = ctx.getAppConfiguration();
+
+ ExportQueue<String, String> expQ = ExportQueue.getInstance(EXPORT_QID, appCfg);
+
+ // Register observer that will processes entries on export queue and write them to the Accumulo
+ // table configured earlier. SimpleTranslator from step 1 is passed here, could have used a
+ // lambda instead.
+ expQ.registerObserver(obsRegistry,
+ new AccumuloExporter<>(EXPORT_QID, appCfg, new SimpleTranslator()));
+
+ // An example observer created using a lambda that adds to the export queue.
+ obsRegistry.forColumn(OBS_COL, WEAK).useObserver((tx,row,col) -> {
+ // Read some data and do some work
+
+ // Add results to export queue
+ String key = // key that identifies export
+ String value = // object to export
+ expQ.add(tx, key, value);
+ });
+ }
+ }
+ ```
+
+## Other use cases
+
+The `getTranslator()` method in [AccumuloReplicator] creates a specialized [AccumuloTranslator] for replicating a Fluo table to Accumulo.
+
+[1]: /docs/fluo-recipes/1.1.0-incubating/export-queue/
+[Exporter]: {{ site.api_static }}/fluo-recipes-core/1.1.0-incubating/org/apache/fluo/recipes/core/export/function/Exporter.html
+[AccumuloExporter]: {{ site.api_static }}/fluo-recipes-accumulo/1.1.0-incubating/org/apache/fluo/recipes/accumulo/export/function/AccumuloExporter.html
+[AccumuloTranslator]: {{ site.api_static }}/fluo-recipes-accumulo/1.1.0-incubating/org/apache/fluo/recipes/accumulo/export/function/AccumuloTranslator.html
+[AccumuloReplicator]: {{ site.api_static }}/fluo-recipes-accumulo/1.1.0-incubating/org/apache/fluo/recipes/accumulo/export/AccumuloReplicator.html
+
diff --git a/docs/fluo-recipes/1.1.0-incubating/combine-queue.md b/docs/fluo-recipes/1.1.0-incubating/combine-queue.md
new file mode 100644
index 0000000..08d4e58
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/combine-queue.md
@@ -0,0 +1,211 @@
+---
+layout: recipes-doc
+title: Combine Queue Recipe
+version: 1.1.0-incubating
+---
+## Background
+
+When many transactions try to modify the same keys, collisions will occur. Too many collisions
+cause transactions to fail and throughput to nose dive. For example, consider [phrasecount]
+which has many transactions processing documents. Each transaction counts the phrases in a document
+and then updates global phrase counts. Since transaction attempts to update many phrases
+, the probability of collisions is high.
+
+## Solution
+
+The [combine queue recipe][CombineQueue] provides a reusable solution for updating many keys while
+avoiding collisions. The recipe also organizes updates into batches in order to improve throughput.
+
+This recipes queues updates to keys for other transactions to process. In the phrase count example
+transactions processing documents queue updates, but do not actually update the counts. Below is an
+example of computing phrasecounts using this recipe.
+
+ * TX1 queues `+1` update for phrase `we want lambdas now`
+ * TX2 queues `+1` update for phrase `we want lambdas now`
+ * TX3 reads the updates and current value for the phrase `we want lambdas now`. There is no current value and the updates sum to 2, so a new value of 2 is written.
+ * TX4 queues `+2` update for phrase `we want lambdas now`
+ * TX5 queues `-1` update for phrase `we want lambdas now`
+ * TX6 reads the updates and current value for the phrase `we want lambdas now`. The current value is 2 and the updates sum to 1, so a new value of 3 is written.
+
+Transactions processing updates have the ability to make additional updates.
+For example in addition to updating the current value for a phrase, the new
+value could also be placed on an export queue to update an external database.
+
+### Buckets
+
+A simple implementation of this recipe would have an update queue for each key. However the
+implementation is slightly more complex. Each update queue is in a bucket and transactions process
+all of the updates in a bucket. This allows more efficient processing of updates for the following
+reasons :
+
+ * When updates are queued, notifications are made per bucket(instead of per a key).
+ * The transaction doing the update can scan the entire bucket reading updates, this avoids a seek for each key being updated.
+ * Also the transaction can request a batch lookup to get the current value of all the keys being updated.
+ * Any additional actions taken on update (like adding something to an export queue) can also be batched.
+ * Data is organized to make reading exiting values for keys in a bucket more efficient.
+
+Which bucket a key goes to is decided using hash and modulus so that multiple updates for a key go
+to the same bucket.
+
+The initial number of tablets to create when applying table optimizations can be controlled by
+setting the buckets per tablet option when configuring a Combine Queue. For example if you
+have 20 tablet servers and 1000 buckets and want 2 tablets per tserver initially then set buckets
+per tablet to 1000/(2*20)=25.
+
+## Example Use
+
+The following code snippets show how to use this recipe for wordcount. The first step is to
+configure it before initializing Fluo. When initializing an ID is needed. This ID is used in two
+ways. First, the ID is used as a row prefix in the table. Therefore nothing else should use that
+row range in the table. Second, the ID is used in generating configuration keys.
+
+The following snippet shows how to configure a combine queue.
+
+```java
+ FluoConfiguration fluoConfig = ...;
+
+ // Set application properties for the combine queue. These properties are read later by
+ // the observers running on each worker.
+ CombineQueue.configure(WcObserverProvider.ID)
+ .keyType(String.class).valueType(Long.class).buckets(119).save(fluoConfig);
+
+ fluoConfig.setObserverProvider(WcObserverProvider.class);
+
+ // initialize Fluo using fluoConfig
+```
+
+Assume the following observer is triggered when a documents is updated. It examines new
+and old document content and determines changes in word counts. These changes are pushed to a
+combine queue.
+
+```java
+public class DocumentObserver implements StringObserver {
+ // word count combine queue
+ private CombineQueue<String, Long> wccq;
+
+ public static final Column NEW_COL = new Column("content", "new");
+ public static final Column CUR_COL = new Column("content", "current");
+
+ public DocumentObserver(CombineQueue<String, Long> wccq) {
+ this.wccq = wccq;
+ }
+
+ @Override
+ public void process(TransactionBase tx, String row, Column col) {
+
+ Preconditions.checkArgument(col.equals(NEW_COL));
+
+ String newContent = tx.gets(row, NEW_COL);
+ String currentContent = tx.gets(row, CUR_COL, "");
+
+ Map<String, Long> newWordCounts = getWordCounts(newContent);
+ Map<String, Long> currentWordCounts = getWordCounts(currentContent);
+
+ // determine changes in word counts between old and new document content
+ Map<String, Long> changes = calculateChanges(newWordCounts, currentWordCounts);
+
+ // queue updates to word counts for processing by other transactions
+ wccq.add(tx, changes);
+
+ // update the current content and delete the new content
+ tx.set(row, CUR_COL, newContent);
+ tx.delete(row, NEW_COL);
+ }
+
+ private static Map<String, Long> getWordCounts(String doc) {
+ // TODO extract words from doc
+ }
+
+ private static Map<String, Long> calculateChanges(Map<String, Long> newCounts,
+ Map<String, Long> currCounts) {
+ Map<String, Long> changes = new HashMap<>();
+
+ // guava Maps class
+ MapDifference<String, Long> diffs = Maps.difference(currCounts, newCounts);
+
+ // compute the diffs for words that changed
+ changes.putAll(Maps.transformValues(diffs.entriesDiffering(),
+ vDiff -> vDiff.rightValue() - vDiff.leftValue()));
+
+ // add all new words
+ changes.putAll(diffs.entriesOnlyOnRight());
+
+ // subtract all words no longer present
+ changes.putAll(Maps.transformValues(diffs.entriesOnlyOnLeft(), l -> l * -1));
+
+ return changes;
+ }
+}
+
+
+```
+
+Each combine queue has two extension points, a [combiner][Combiner] and a [change
+observer][ChangeObserver]. The combine queue configures a Fluo observer to process queued
+updates. When processing updates the two extension points are called. The code below shows
+how to use these extension points.
+
+A change observer can do additional processing when a batch of key values are updated. Below
+updates are queued for export to an external database. The export is given the new and old value
+allowing it to delete the old value if needed.
+
+```java
+public class WcObserverProvider implements ObserverProvider {
+
+ public static final String ID = "wc";
+
+ @Override
+ public void provide(Registry obsRegistry, Context ctx) {
+
+ ExportQueue<String, MyDatabaseExport> exportQ = createExportQueue(ctx);
+
+ // Create a combine queue for computing word counts.
+ CombineQueue<String, Long> wcMap = CombineQueue.getInstance(ID, ctx.getAppConfiguration());
+
+ // Register observer that updates the Combine Queue
+ obsRegistry.forColumn(DocumentObserver.NEW_COL, STRONG).useObserver(new DocumentObserver(wcMap));
+
+ // Used to join new and existing values for a key. The lambda sums all values and returns
+ // Optional.empty() when the sum is zero. Returning Optional.empty() causes the key/value to be
+ // deleted. Could have used the built in SummingCombiner.
+ Combiner<String, Long> combiner = input -> input.stream().reduce(Long::sum).filter(l -> l != 0);
+
+ // Called when the value of a key changes. The lambda exports these changes to an external
+ // database. Make sure to read ChangeObserver's javadoc.
+ ChangeObserver<String, Long> changeObs = (tx, changes) -> {
+ for (Change<String, Long> update : changes) {
+ String word = update.getKey();
+ Optional<Long> oldVal = update.getOldValue();
+ Optional<Long> newVal = update.getNewValue();
+
+ // Queue an export to let an external database know the word count has changed.
+ exportQ.add(tx, word, new MyDatabaseExport(oldVal, newVal));
+ }
+ };
+
+ // Register observer that handles updates to the CombineQueue. This observer will use the
+ // combiner and valueObserver.
+ wcMap.registerObserver(obsRegistry, combiner, changeObs);
+ }
+}
+```
+
+## Guarantees
+
+This recipe makes two important guarantees about updates for a key when it
+calls `process()` on a [ChangeObserver].
+
+ * The new value reported for an update will be derived from combining all
+ updates that were committed before the transaction thats processing updates
+ started. The implementation may have to make multiple passes over queued
+ updates to achieve this. In the situation where TX1 queues a `+1` and later
+ TX2 queues a `-1` for the same key, there is no need to worry about only seeing
+ the `-1` processed. A transaction that started processing updates after TX2
+ committed would process both.
+ * The old value will always be what was reported as the new value in the
+ previous transaction that called `ChangeObserver.process()`.
+
+[phrasecount]: https://github.com/fluo-io/phrasecount
+[CombineQueue]: {{ site.api_static }}/fluo-recipes-core/1.1.0-incubating/org/apache/fluo/recipes/core/combine/CombineQueue.html
+[ChangeObserver]: {{ site.api_static }}/fluo-recipes-core/1.1.0-incubating/org/apache/fluo/recipes/core/combine/ChangeObserver.html
+[Combiner]: {{ site.api_static }}/fluo-recipes-core/1.1.0-incubating/org/apache/fluo/recipes/core/combine/Combiner.html
diff --git a/docs/fluo-recipes/1.1.0-incubating/export-queue.md b/docs/fluo-recipes/1.1.0-incubating/export-queue.md
new file mode 100644
index 0000000..3ee379f
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/export-queue.md
@@ -0,0 +1,303 @@
+---
+layout: recipes-doc
+title: Export Queue Recipe
+version: 1.1.0-incubating
+---
+## Background
+
+Fluo is not suited for servicing low latency queries for two reasons. First, the implementation of
+transactions are designed for throughput. To get throughput, transactions recover lazily from
+failures and may wait on other transactions. Both of these design decisions can
+lead to delays of individual transactions, but do not negatively impact throughput. The second
+reason is that Fluo observers executing transactions will likely cause a large number of random
+accesses. This could lead to high response time variability for an individual random access. This
+variability would not impede throughput but would impede the goal of low latency.
+
+One way to make data transformed by Fluo available for low latency queries is
+to export that data to another system. For example Fluo could run on
+cluster A, continually transforming a large data set, and exporting data to
+Accumulo tables on cluster B. The tables on cluster B would service user
+queries. Fluo Recipes has built in support for [exporting to Accumulo][3],
+however this recipe can be used to export to systems other than Accumulo, like
+Redis, Elasticsearch, MySQL, etc.
+
+Exporting data from Fluo is easy to get wrong which is why this recipe exists.
+To understand what can go wrong consider the following example observer
+transaction.
+
+```java
+public class MyObserver implements StringObserver {
+
+ static final Column UPDATE_COL = new Column("meta", "numUpdates");
+ static final Column COUNTER_COL = new Column("meta", "counter1");
+
+ //reperesents a Query system extrnal to Fluo that is updated by Fluo
+ QuerySystem querySystem;
+
+ @Override
+ public void process(TransactionBase tx, String row, Column col) {
+
+ int oldCount = Integer.parseInt(tx.gets(row, COUNTER_COL, "0"));
+ int numUpdates = Integer.parseInt(tx.gets(row, UPDATE_COL, "0"));
+ int newCount = oldCount + numUpdates;
+
+ tx.set(row, COUNTER_COL, "" + newCount);
+ tx.delete(row, UPDATE_COL);
+
+ //Build an inverted index in the query system, based on count from the
+ //meta:counter1 column in fluo. Do this by creating rows for the
+ //external query system based on the count.
+ String oldCountRow = String.format("%06d", oldCount);
+ String newCountRow = String.format("%06d", newCount);
+
+ //add a new entry to the inverted index
+ querySystem.insertRow(newCountRow, row);
+ //remove the old entry from the inverted index
+ querySystem.deleteRow(oldCountRow, row);
+ }
+}
+```
+
+The above example would keep the external index up to date beautifully as long
+as the following conditions are met.
+
+ * Threads executing transactions always complete successfully.
+ * Only a single thread ever responds to a notification.
+
+However these conditions are not guaranteed by Fluo. Multiple threads may
+attempt to process a notification concurrently (only one may succeed). Also at
+any point in time a transaction may fail (for example the computer executing it
+may reboot). Both of these problems will occur and will lead to corruption of
+the external index in the example. The inverted index and Fluo will become
+inconsistent. The inverted index will end up with multiple entries (that are
+never cleaned up) for single entity even though the intent is to only have one.
+
+The root of the problem in the example above is that its exporting uncommitted
+data. There is no guarantee that setting the column `<row>:meta:counter1` to
+`newCount` will succeed until the transaction is successfully committed.
+However, `newCountRow` is derived from `newCount` and written to the external query
+system before the transaction is committed (Note : for observers, the
+transaction is committed by the framework after `process(...)` is called). So
+if the transaction fails, the next time it runs it could compute a completely
+different value for `newCountRow` (and it would not delete what was written by the
+failed transaction).
+
+## Solution
+
+The simple solution to the problem of exporting uncommitted data is to only
+export committed data. There are multiple ways to accomplish this. This
+recipe offers a reusable implementation of one method. This recipe has the
+following elements:
+
+ * An export queue that transactions can add key/values to. Only if the transaction commits successfully will the key/value end up in the queue. A Fluo application can have multiple export queues, each one must have a unique id.
+ * When a key/value is added to the export queue, its given a sequence number. This sequence number is based on the transactions start timestamp.
+ * Each export queue is configured with an observer that processes key/values that were successfully committed to the queue.
+ * When key/values in an export queue are processed, they are deleted so the export queue does not keep any long term data.
+ * Key/values in an export queue are placed in buckets. This is done so that all of the updates in a bucket can be processed in a single transaction. This allows an efficient implementation of this recipe in Fluo. It can also lead to efficiency in a system being exported to, if the system can benefit from batching updates. The number of buckets in an export queue is configurable.
+
+There are three requirements for using this recipe :
+
+ * Must configure export queues before initializing a Fluo application.
+ * Transactions adding to an export queue must get an instance of the queue using its unique QID.
+ * Must create a class or lambda that implements [Exporter][1] in order to process exports.
+
+## Example Use
+
+This example shows how to incrementally build an inverted index in an external query system using an
+export queue. The class below is simple POJO used as the value for the export queue.
+
+```java
+class CountUpdate {
+ public int oldCount;
+ public int newCount;
+
+ public CountUpdate(int oc, int nc) {
+ this.oldCount = oc;
+ this.newCount = nc;
+ }
+}
+```
+
+The following code shows how to configure an export queue. This code
+modifies the FluoConfiguration object with options needed for the export queue.
+This FluoConfiguration object should be used to initialize the fluo
+application.
+
+```java
+public class FluoApp {
+
+ // export queue id "ici" means inverted count index
+ public static final String EQ_ID = "ici";
+
+ static final Column UPDATE_COL = new Column("meta", "numUpdates");
+ static final Column COUNTER_COL = new Column("meta", "counter1");
+
+ public static class AppObserverProvider implements ObserverProvider {
+ @Override
+ public void provide(Registry obsRegistry, Context ctx) {
+ ExportQueue<String, CountUpdate> expQ =
+ ExportQueue.getInstance(EQ_ID, ctx.getAppConfiguration());
+
+ // register observer that will queue data to export
+ obsRegistry.forColumn(UPDATE_COL, STRONG).useObserver(new MyObserver(expQ));
+
+ // register observer that will export queued data
+ expQ.registerObserver(obsRegistry, new CountExporter());
+ }
+ }
+
+ /**
+ * Call this method before initializing Fluo.
+ *
+ * @param fluoConfig the configuration object that will be used to initialize Fluo
+ */
+ public static void preInit(FluoConfiguration fluoConfig) {
+
+ // Set properties for export queue in Fluo app configuration
+ ExportQueue.configure(QUEUE_ID)
+ .keyType(String.class)
+ .valueType(CountUpdate.class)
+ .buckets(1009)
+ .bucketsPerTablet(10)
+ .save(getFluoConfiguration());
+
+ fluoConfig.setObserverProvider(AppObserverProvider.class);
+ }
+}
+```
+
+Below is updated version of the observer from above thats now using an export
+queue.
+
+```java
+public class MyObserver implements StringObserver {
+
+ private ExportQueue<String, CountUpdate> exportQueue;
+
+ public MyObserver(ExportQueue<String, CountUpdate> exportQueue) {
+ this.exportQueue = exportQueue;
+ }
+
+ @Override
+ public void process(TransactionBase tx, String row, Column col) {
+
+ int oldCount = Integer.parseInt(tx.gets(row, FluoApp.COUNTER_COL, "0"));
+ int numUpdates = Integer.parseInt(tx.gets(row, FluoApp.UPDATE_COL, "0"));
+ int newCount = oldCount + numUpdates;
+
+ tx.set(row, FluoApp.COUNTER_COL, "" + newCount);
+ tx.delete(row, FluoApp.UPDATE_COL);
+
+ // Because the update to the export queue is part of the transaction,
+ // either the update to meta:counter1 is made and an entry is added to
+ // the export queue or neither happens.
+ exportQueue.add(tx, row, new CountUpdate(oldCount, newCount));
+ }
+}
+```
+
+The export queue will call the `accept()` method on the class below to process entries queued for
+export. It is possible the call to `accept()` can fail part way through and/or be called multiple
+times. In the case of failures the export consumer will be called again with the same data.
+Its possible for the same export entry to be processed on multiple computers at different times.
+This can cause exports to arrive out of order. The purpose of the sequence number is to help
+systems receiving out of order and redundant data.
+
+```java
+public class CountExporter implements Exporter<String, CountUpdate> {
+ // represents the external query system we want to update from Fluo
+ QuerySystem querySystem;
+
+ @Override
+ public void export(Iterator<SequencedExport<String, CountUpdate>> exports) {
+ BatchUpdater batchUpdater = querySystem.getBatchUpdater();
+
+ while (exports.hasNext()) {
+ SequencedExport<String, CountUpdate> export = exports.next();
+ String row = export.getKey();
+ CountUpdate uc = export.getValue();
+ long seqNum = export.getSequence();
+
+ String oldCountRow = String.format("%06d", uc.oldCount);
+ String newCountRow = String.format("%06d", uc.newCount);
+
+ // add a new entry to the inverted index
+ batchUpdater.insertRow(newCountRow, row, seqNum);
+ // remove the old entry from the inverted index
+ batchUpdater.deleteRow(oldCountRow, row, seqNum);
+ }
+
+ // flush all of the updates to the external query system
+ batchUpdater.close();
+ }
+}
+```
+
+## Schema
+
+Each export queue stores its data in the Fluo table in a contiguous row range.
+This row range is defined by using the export queue id as a row prefix for all
+data in the export queue. So the row range defined by the export queue id
+should not be used by anything else.
+
+All data stored in an export queue is [transient](transient.md). When an export
+queue is configured, it will recommend split points using the [table
+optimization process](table-optimization.md). The number of splits generated
+by this process can be controlled by setting the number of buckets per tablet
+when configuring an export queue.
+
+## Concurrency
+
+Additions to the export queue will never collide. If two transactions add the
+same key at around the same time and successfully commit, then two entries with
+different sequence numbers will always be added to the queue. The sequence
+number is based on the start timestamp of the transactions.
+
+If the key used to add items to the export queue is deterministically derived
+from something the transaction is writing to, then that will cause a collision.
+For example consider the following interleaving of two transactions adding to
+the same export queue in a manner that will collide. Note, TH1 is shorthand for
+thread 1, ek() is a function the creates the export key, and ev() is a function
+that creates the export value.
+
+ 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
+ 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
+ 1. TH1 : exportQueueA.add(tx1, key1, val1)
+ 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
+ 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
+ 1. TH2 : exportQueueA.add(tx2, key2, val2)
+ 1. TH1 : tx1.set(`row1`,`fam1:qual1`, val1)
+ 1. TH2 : tx2.set(`row1`,`fam1:qual1`, val2)
+
+In the example above only one transaction will succeed because both are setting
+`row1 fam1:qual1`. Since adding to the export queue is part of the
+transaction, only the transaction that succeeds will add something to the
+queue. If the funtion ek() in the example is deterministic, then both
+transactions would have been trying to add the same key to the export queue.
+
+With the above method, we know that transactions adding entries to the queue for
+the same key must have executed [serially][2]. Knowing that transactions which
+added the same key did not overlap in time makes reasoning about those export
+entries very simple.
+
+The example below is a slight modification of the example above. In this
+example both transactions will successfully add entries to the queue using the
+same key. Both transactions succeed because they are writing to different
+cells (`rowB fam1:qual2` and `rowA fam1:qual2`). This approach makes it more
+difficult to reason about export entries with the same key, because the
+transactions adding those entries could have overlapped in time. This is an
+example of write skew mentioned in the Percolater paper.
+
+ 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
+ 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
+ 1. TH1 : exportQueueA.add(tx1, key1, val1)
+ 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
+ 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
+ 1. TH2 : exportQueueA.add(tx2, key2, val2)
+ 1. TH1 : tx1.set(`rowA`,`fam1:qual2`, val1)
+ 1. TH2 : tx2.set(`rowB`,`fam1:qual2`, val2)
+
+[1]: {{ site.api_static }}/fluo-recipes-core/1.1.0-incubating/org/apache/fluo/recipes/core/export/function/Exporter.html
+[2]: https://en.wikipedia.org/wiki/Serializability
+[3]: /docs/fluo-recipes/1.1.0-incubating/accumulo-export-queue/
+
diff --git a/docs/fluo-recipes/1.1.0-incubating/index.md b/docs/fluo-recipes/1.1.0-incubating/index.md
new file mode 100644
index 0000000..bec214c
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/index.md
@@ -0,0 +1,103 @@
+---
+layout: recipes-doc
+title: Fluo Recipes 1.1.0-incubating Documentation
+version: 1.1.0-incubating
+---
+**Fluo Recipes are common code for [Apache Fluo][fluo] application developers.**
+
+Fluo Recipes build on the [Fluo API][fluo-api] to offer additional functionality to
+developers. They are published separately from Fluo on their own release schedule.
+This allows Fluo Recipes to iterate and innovate faster than Fluo (which will maintain
+a more minimal API on a slower release cycle). Fluo Recipes offers code to implement
+common patterns on top of Fluo's API. It also offers glue code to external libraries
+like Spark and Kryo.
+
+### Documentation
+
+Recipes are documented below and in the [Recipes API docs][recipes-api].
+
+* [Combine Queue][combine-q] - A recipe for concurrently updating many keys while avoiding
+ collisions.
+* [Export Queue][export-q] - A recipe for exporting data from Fluo to external systems.
+* [Row Hash Prefix][row-hasher] - A recipe for spreading data evenly in a row prefix.
+* [RecordingTransaction][recording-tx] - A wrapper for a Fluo transaction that records all transaction
+operations to a log which can be used to export data from Fluo.
+* [Testing][testing] Some code to help write Fluo Integration test.
+
+Recipes have common needs that are broken down into the following reusable components.
+
+* [Serialization][serialization] - Common code for serializing POJOs.
+* [Transient Ranges][transient] - Standardized process for dealing with transient data.
+* [Table optimization][optimization] - Standardized process for optimizing the Fluo table.
+* [Spark integration][spark] - Spark+Fluo integration code.
+
+### Usage
+
+The Fluo Recipes project publishes multiple jars to Maven Central for each release.
+The `fluo-recipes-core` jar is the primary jar. It is where most recipes live and where
+they are placed by default if they have minimal dependencies beyond the Fluo API.
+
+Fluo Recipes with dependencies that bring in many transitive dependencies publish
+their own jar. For example, recipes that depend on Apache Spark are published in the
+`fluo-recipes-spark` jar. If you don't plan on using code in the `fluo-recipes-spark`
+jar, you should avoid including it in your pom.xml to avoid a transitive dependency on
+Spark.
+
+Below is a sample Maven POM containing all possible Fluo Recipes dependencies:
+
+```xml
+ <properties>
+ <fluo-recipes.version>1.1.0-incubating</fluo-recipes.version>
+ </properties>
+
+ <dependencies>
+ <!-- Required. Contains recipes that are only depend on the Fluo API -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-core</artifactId>
+ <version>${fluo-recipes.version}</version>
+ </dependency>
+ <!-- Optional. Serialization code that depends on Kryo -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-kryo</artifactId>
+ <version>${fluo-recipes.version}</version>
+ </dependency>
+ <!-- Optional. Common code for using Fluo with Accumulo -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-accumulo</artifactId>
+ <version>${fluo-recipes.version}</version>
+ </dependency>
+ <!-- Optional. Common code for using Fluo with Spark -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-spark</artifactId>
+ <version>${fluo-recipes.version}</version>
+ </dependency>
+ <!-- Optional. Common code for writing Fluo integration tests -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-test</artifactId>
+ <version>${fluo-recipes.version}</version>
+ <scope>test</scope>
+ </dependency>
+ </dependencies>
+```
+
+[fluo]: https://fluo.apache.org/
+[fluo-api]: /api
+[recipes-api]: /api
+[combine-q]: /docs/fluo-recipes/1.1.0-incubating/combine-queue/
+[export-q]: /docs/fluo-recipes/1.1.0-incubating/export-queue/
+[recording-tx]: /docs/fluo-recipes/1.1.0-incubating/recording-tx/
+[serialization]: /docs/fluo-recipes/1.1.0-incubating/serialization/
+[transient]: /docs/fluo-recipes/1.1.0-incubating/transient/
+[optimization]: /docs/fluo-recipes/1.1.0-incubating/table-optimization/
+[row-hasher]: /docs/fluo-recipes/1.1.0-incubating/row-hasher/
+[spark]: /docs/fluo-recipes/1.1.0-incubating/spark/
+[testing]: /docs/fluo-recipes/1.1.0-incubating/testing/
+[ti]: https://travis-ci.org/apache/incubator-fluo-recipes.svg?branch=master
+[tl]: https://travis-ci.org/apache/incubator-fluo-recipes
+[li]: http://img.shields.io/badge/license-ASL-blue.svg
+[ll]: https://github.com/apache/incubator-fluo-recipes/blob/master/LICENSE
diff --git a/docs/fluo-recipes/1.1.0-incubating/recording-tx.md b/docs/fluo-recipes/1.1.0-incubating/recording-tx.md
new file mode 100644
index 0000000..83814bd
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/recording-tx.md
@@ -0,0 +1,72 @@
+---
+layout: recipes-doc
+title: RecordingTransaction recipe
+version: 1.1.0-incubating
+---
+A `RecordingTransaction` is an implementation of `Transaction` that logs all transaction operations
+(i.e GET, SET, or DELETE) to a `TxLog` object for later uses such as exporting data. The code below
+shows how a RecordingTransaction is created by wrapping a Transaction object:
+
+```java
+RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
+```
+
+A predicate function can be passed to wrap method to select which log entries to record. The code
+below only records log entries whose column family is `meta`:
+
+```java
+RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx,
+ le -> le.getColumn().getFamily().toString().equals("meta"));
+```
+
+After creating a RecordingTransaction, users can use it as they would use a Transaction object.
+
+```java
+Bytes value = rtx.get(Bytes.of("r1"), new Column("cf1", "cq1"));
+```
+
+While SET or DELETE operations are always recorded to the log, GET operations are only recorded if a
+value was found at the requested row/column. Also, if a GET method returns an iterator, only the GET
+operations that are retrieved from the iterator are logged. GET operations are logged as they are
+necessary if you want to determine the changes made by the transaction.
+
+When you are done operating on the transaction, you can retrieve the TxLog using the following code:
+
+```java
+TxLog myTxLog = rtx.getTxLog()
+```
+
+Below is example code of how a RecordingTransaction can be used in an observer to record all operations
+performed by the transaction in a TxLog. In this example, a GET (if data exists) and SET operation
+will be logged. This TxLog can be added to an export queue and later used to export updates from
+Fluo.
+
+```java
+public class MyObserver extends AbstractObserver {
+
+ private static final TYPEL = new TypeLayer(new StringEncoder());
+
+ private ExportQueue<Bytes, TxLog> exportQueue;
+
+ @Override
+ public void process(TransactionBase tx, Bytes row, Column col) {
+
+ // create recording transaction (rtx)
+ RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
+
+ // use rtx to create a typed transaction & perform operations
+ TypedTransactionBase ttx = TYPEL.wrap(rtx);
+ int count = ttx.get().row(row).fam("meta").qual("counter1").toInteger(0);
+ ttx.mutate().row(row).fam("meta").qual("counter1").set(count+1);
+
+ // when finished performing operations, retrieve transaction log
+ TxLog txLog = rtx.getTxLog()
+
+ // add txLog to exportQueue if not empty
+ if (!txLog.isEmpty()) {
+ //do not pass rtx to exportQueue.add()
+ exportQueue.add(tx, row, txLog)
+ }
+ }
+}
+```
diff --git a/docs/fluo-recipes/1.1.0-incubating/row-hasher.md b/docs/fluo-recipes/1.1.0-incubating/row-hasher.md
new file mode 100644
index 0000000..e38dc85
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/row-hasher.md
@@ -0,0 +1,122 @@
+---
+layout: recipes-doc
+title: Row hash prefix recipe
+version: 1.1.0-incubating
+---
+## Background
+
+Transactions are implemented in Fluo using conditional mutations. Conditional
+mutations require server side processing on tservers. If data is not spread
+evenly, it can cause some tservers to execute more conditional mutations than
+others. These tservers doing more work can become a bottleneck. Most real
+world data is not uniform and can cause this problem.
+
+Before the Fluo [Webindex example][1] started using this recipe it suffered
+from this problem. The example was using reverse dns encoded URLs for row keys
+like `p:com.cnn/story1.html`. This made certain portions of the table more
+popular, which in turn made some tservers do much more work. This uneven
+distribution of work lead to lower throughput and uneven performance. Using
+this recipe made those problems go away.
+
+## Solution
+
+This recipe provides code to help add a hash of the row as a prefix of the row.
+Using this recipe rows are structured like the following.
+
+```
+<prefix>:<fixed len row hash>:<user row>
+```
+
+The recipe also provides code to help generate split points and configure
+balancing of the prefix.
+
+## Example Use
+
+```java
+import org.apache.fluo.api.config.FluoConfiguration;
+import org.apache.fluo.api.data.Bytes;
+import org.apache.fluo.recipes.core.data.RowHasher;
+
+public class RowHasherExample {
+
+
+ private static final RowHasher PAGE_ROW_HASHER = new RowHasher("p");
+
+ // Provide one place to obtain row hasher.
+ public static RowHasher getPageRowHasher() {
+ return PAGE_ROW_HASHER;
+ }
+
+ public static void main(String[] args) {
+ RowHasher pageRowHasher = getPageRowHasher();
+
+ String revUrl = "org.wikipedia/accumulo";
+
+ // Add a hash prefix to the row. Use this hashedRow in your transaction
+ Bytes hashedRow = pageRowHasher.addHash(revUrl);
+ System.out.println("hashedRow : " + hashedRow);
+
+ // Remove the prefix. This can be used by transactions dealing with the hashed row.
+ Bytes orig = pageRowHasher.removeHash(hashedRow);
+ System.out.println("orig : " + orig);
+
+
+ // Generate table optimizations for the recipe. This can be called when setting up an
+ // application that uses a hashed row.
+ int numTablets = 20;
+
+ // The following code would normally be called before initializing Fluo. This code
+ // registers table optimizations for your prefix+hash.
+ FluoConfiguration conf = new FluoConfiguration();
+ RowHasher.configure(conf, PAGE_ROW_HASHER.getPrefix(), numTablets);
+
+ // Normally you would not call the following code, it would be called automatically for you by
+ // TableOperations.optimizeTable(). Calling this code here to show what table optimization will
+ // be generated.
+ TableOptimizations tableOptimizations = new RowHasher.Optimizer()
+ .getTableOptimizations(PAGE_ROW_HASHER.getPrefix(), conf.getAppConfiguration());
+ System.out.println("Balance config : " + tableOptimizations.getTabletGroupingRegex());
+ System.out.println("Splits : ");
+ tableOptimizations.getSplits().forEach(System.out::println);
+ System.out.println();
+ }
+}
+```
+
+The example program above prints the following.
+
+```
+hashedRow : p:1yl0:org.wikipedia/accumulo
+orig : org.wikipedia/accumulo
+Balance config : (\Qp:\E).*
+Splits :
+p:1sst
+p:3llm
+p:5eef
+p:7778
+p:9001
+p:assu
+p:clln
+p:eeeg
+p:g779
+p:i002
+p:jssv
+p:lllo
+p:neeh
+p:p77a
+p:r003
+p:sssw
+p:ullp
+p:weei
+p:y77b
+p:~
+```
+
+The split points are used to create tablets in the Accumulo table used by Fluo.
+Data and computation will spread very evenly across these tablets. The
+Balancing config will spread the tablets evenly across the tablet servers,
+which will spread the computation evenly. See the [table optimizations][2]
+documentation for information on how to apply the optimizations.
+
+[1]: https://github.com/fluo-io/webindex
+[2]: /docs/fluo-recipes/1.1.0-incubating/table-optimization/
diff --git a/docs/fluo-recipes/1.1.0-incubating/serialization.md b/docs/fluo-recipes/1.1.0-incubating/serialization.md
new file mode 100644
index 0000000..01c53fd
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/serialization.md
@@ -0,0 +1,76 @@
+---
+layout: recipes-doc
+title: Serializing Data
+version: 1.1.0-incubating
+---
+Various Fluo Recipes deal with POJOs and need to serialize them. The
+serialization mechanism is configurable and defaults to using [Kryo][1].
+
+## Custom Serialization
+
+In order to use a custom serialization method, two steps need to be taken. The
+first step is to implement [SimpleSerializer][2]. The second step is to
+configure Fluo Recipes to use the custom implementation. This needs to be done
+before initializing Fluo. Below is an example of how to do this.
+
+```java
+ FluoConfiguration fluoConfig = ...;
+ //assume MySerializer implements SimpleSerializer
+ SimpleSerializer.setSetserlializer(fluoConfig, MySerializer.class);
+ //initialize Fluo using fluoConfig
+```
+
+## Kryo Factory
+
+If using the default Kryo serializer implementation, then creating a
+KryoFactory implementation can lead to smaller serialization size. When Kryo
+serializes an object graph, it will by default include the fully qualified
+names of the classes in the serialized data. This can be avoided by
+[registering classes][3] that will be serialized. Registration is done by
+creating a KryoFactory and then configuring Fluo Recipes to use it. The
+example below shows how to do this.
+
+For example assume the POJOs named `Node` and `Edge` will be serialized and
+need to be registered with Kryo. This could be done by creating a KryoFactory
+like the following.
+
+```java
+
+package com.foo;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.pool.KryoFactory;
+
+import com.foo.data.Edge;
+import com.foo.data.Node;
+
+public class MyKryoFactory implements KryoFactory {
+ @Override
+ public Kryo create() {
+ Kryo kryo = new Kryo();
+
+ //Explicitly assign each class a unique id here to ensure its stable over
+ //time and in different environments with different dependencies.
+ kryo.register(Node.class, 9);
+ kryo.register(Edge.class, 10);
+
+ //instruct kryo that these are the only classes we expect to be serialized
+ kryo.setRegistrationRequired(true);
+
+ return kryo;
+ }
+}
+```
+
+Fluo Recipes must be configured to use this factory. The following code shows
+how to do this.
+
+```java
+ FluoConfiguration fluoConfig = ...;
+ KryoSimplerSerializer.setKryoFactory(fluoConfig, MyKryoFactory.class);
+ //initialize Fluo using fluoConfig
+```
+
+[1]: https://github.com/EsotericSoftware/kryo
+[2]: {{ site.api_static }}/fluo-recipes-core/1.1.0-incubating/org/apache/fluo/recipes/core/serialization/SimpleSerializer.html
+[3]: https://github.com/EsotericSoftware/kryo#registration
diff --git a/docs/fluo-recipes/1.1.0-incubating/spark.md b/docs/fluo-recipes/1.1.0-incubating/spark.md
new file mode 100644
index 0000000..9f93a3a
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/spark.md
@@ -0,0 +1,19 @@
+---
+layout: recipes-doc
+title: Apache Spark helper code
+version: 1.1.0-incubating
+---
+Fluo Recipes has some helper code for [Apache Spark][spark]. Most of the helper code is currently
+related to bulk importing data into Accumulo. This is useful for initializing a new Fluo table with
+historical data via Spark. The Spark helper code is found at
+[org/apache/fluo/recipes/spark/][sdir].
+
+For information on using Spark to load data into Fluo, check out this [blog post][blog].
+
+If you know of other Spark+Fluo integration code that would be useful, then please consider [opening
+an issue](https://github.com/apache/fluo-recipes/issues/new).
+
+[spark]: https://spark.apache.org
+[sdir]: {{ site.api_base }}/fluo-recipes-spark/1.1.0-incubating/
+[blog]: https://fluo.apache.org/blog/2016/12/22/spark-load/
+
diff --git a/docs/fluo-recipes/1.1.0-incubating/table-optimization.md b/docs/fluo-recipes/1.1.0-incubating/table-optimization.md
new file mode 100644
index 0000000..54f192a
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/table-optimization.md
@@ -0,0 +1,65 @@
+---
+layout: recipes-doc
+title: Fluo Table Optimization
+version: 1.1.0-incubating
+---
+## Background
+
+Recipes may need to make Accumulo specific table modifications for optimal
+performance. Configuring the [Accumulo tablet balancer][3] and adding splits are
+two optimizations that are currently done. Offering a standard way to do these
+optimizations makes it easier to use recipes correctly. These optimizations
+are optional. You could skip them for integration testing, but would probably
+want to use them in production.
+
+## Java Example
+
+```java
+FluoConfiguration fluoConf = ...
+
+//export queue configure method will return table optimizations it would like made
+ExportQueue.configure(fluoConf, ...);
+
+//CollisionFreeMap.configure() will return table optimizations it would like made
+CollisionFreeMap.configure(fluoConf, ...);
+
+//configure optimizations for a prefixed hash range of a table
+RowHasher.configure(fluoConf, ...);
+
+//initialize Fluo
+FluoFactory.newAdmin(fluoConf).initialize(...)
+
+//Automatically optimize the Fluo table for all configured recipes
+TableOperations.optimizeTable(fluoConf);
+```
+
+[TableOperations][2] is provided in the Accumulo module of Fluo Recipes.
+
+## Command Example
+
+Fluo Recipes provides an easy way to optimize a Fluo table for configured
+recipes from the command line. This should be done after configuring reciped
+and initializing Fluo. Below are example command for initializing in this way.
+
+```bash
+
+#create application
+fluo new app1
+
+#configure application
+
+#initialize Fluo
+fluo init app1
+
+#optimize table for all configured recipes
+fluo exec app1 org.apache.fluo.recipes.accumulo.cmds.OptimizeTable
+```
+
+## Table optimization registry
+
+Recipes register themself by calling [TableOptimizations.registerOptimization()][1]. Anyone can use
+this mechanism, its not limited to use by exisitng recipes.
+
+[1]: {{ site.api_static }}/fluo-recipes-core/1.1.0-incubating/org/apache/fluo/recipes/core/common/TableOptimizations.html
+[2]: {{ site.api_static }}/fluo-recipes-accumulo/1.1.0-incubating/org/apache/fluo/recipes/accumulo/ops/TableOperations.html
+[3]: http://accumulo.apache.org/blog/2015/03/20/balancing-groups-of-tablets.html
diff --git a/docs/fluo-recipes/1.1.0-incubating/testing.md b/docs/fluo-recipes/1.1.0-incubating/testing.md
new file mode 100644
index 0000000..bc598d6
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/testing.md
@@ -0,0 +1,14 @@
+---
+layout: recipes-doc
+title: Testing
+version: 1.1.0-incubating
+---
+Fluo includes MiniFluo which makes it possible to write an integeration test that
+runs against a real Fluo instance. Fluo Recipes provides the following utility
+code for writing an integration test.
+
+ * [FluoITHelper][1] A class with utility methods for comparing expected data with whats in Fluo.
+ * [AccumuloExportITBase][2] A base class for writing an integration test that exports data from Fluo to an external Accumulo table.
+
+[1]: {{ site.api_static }}/fluo-recipes-test/1.1.0-incubating/org/apache/fluo/recipes/test/FluoITHelper.html
+[2]: {{ site.api_static }}/fluo-recipes-test/1.1.0-incubating/org/apache/fluo/recipes/test/AccumuloExportITBase.html
diff --git a/docs/fluo-recipes/1.1.0-incubating/transient.md b/docs/fluo-recipes/1.1.0-incubating/transient.md
new file mode 100644
index 0000000..1930270
--- /dev/null
+++ b/docs/fluo-recipes/1.1.0-incubating/transient.md
@@ -0,0 +1,83 @@
+---
+layout: recipes-doc
+title: Transient data
+version: 1.1.0-incubating
+---
+## Background
+
+Some recipes store transient data in a portion of the Fluo table. Transient
+data is data thats continually being added and deleted. Also these transient
+data ranges contain no long term data. The way Fluo works, when data is
+deleted a delete marker is inserted but the data is actually still there. Over
+time these transient ranges of the table will have a lot more delete markers
+than actual data if nothing is done. If nothing is done, then processing
+transient data will get increasingly slower over time.
+
+These delete markers can be cleaned up by forcing Accumulo to compact the
+Fluo table, which will run Fluos garbage collection iterator. However,
+compacting the entire table to clean up these ranges within a table is
+overkill. Alternatively, Accumulo supports compacting ranges of a table. So
+a good solution to the delete marker problem is to periodically compact just
+the transient ranges.
+
+Fluo Recipes provides helper code to deal with transient data ranges in a
+standard way.
+
+## Registering Transient Ranges
+
+Recipes like [Export Queue](export-queue.md) will automatically register
+transient ranges when configured. If you would like to register your own
+transient ranges, use [TransientRegistry][1]. Below is a simple example of
+using this.
+
+```java
+FluoConfiguration fluoConfig = ...;
+TransientRegistry transientRegistry = new TransientRegistry(fluoConfig.getAppConfiguration());
+transientRegistry.addTransientRange(new RowRange(startRow, endRow));
+
+//Initialize Fluo using fluoConfig. This will store the registered ranges in
+//zookeeper making them availiable on any node later.
+```
+
+## Compacting Transient Ranges
+
+Although you may never need to register transient ranges directly, you will
+need to periodically compact transient ranges if using a recipe that registers
+them. Using [TableOperations][2] this can be done with one line of Java code
+like the following.
+
+```java
+FluoConfiguration fluoConfig = ...;
+TableOperations.compactTransient(fluoConfig);
+```
+
+Fluo recipes provides an easy way to compact transient ranges from the command line using the `fluo exec` command as follows:
+
+```
+fluo exec <app name> org.apache.fluo.recipes.accumulo.cmds.CompactTransient [<interval> [<multiplier>]]
+```
+
+If no arguments are specified the command will call `compactTransient()` once.
+If `<interval>` is specified the command will run forever compacting transient
+ranges sleeping `<interval>` seconds between compacting each transient ranges.
+
+In the case where Fluo is backed up in processing data a transient range could
+have a lot of data queued and compacting it too frequently would be
+counterproductive. To avoid this the `CompactTransient` command will consider
+the time it took to compact a range when deciding when to compact that range
+next. This is where the `<multiplier>` argument comes in, the time to sleep
+between compactions of a range is determined as follows. If not specified, the
+multiplier defaults to 3.
+
+```java
+ sleepTime = Math.max(compactTime * multiplier, interval);
+```
+
+For example assume a Fluo application has two transient ranges. Also assume
+CompactTransient is run with an interval of 600 and a multiplier of 10. If the
+first range takes 20 seconds to compact, then it will be compacted again in 600
+seconds. If the second range takes 80 seconds to compact, then it will be
+compacted again in 800 seconds.
+
+[1]: {{ site.api_static }}/fluo-recipes-core/1.1.0-incubating/org/apache/fluo/recipes/core/common/TransientRegistry.html
+[2]: {{ site.api_static }}/fluo-recipes-accumulo/1.1.0-incubating/org/apache/fluo/recipes/accumulo/ops/TableOperations.html
diff --git a/docs/index.md b/docs/index.md
index 63dab6d..b7ace5f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -17,6 +17,7 @@ For a general overview of Fluo, take the [Fluo tour](/tour/).
### Apache Fluo Recipes documentation
+* [1.1.0-incubating][recipes-1.1] - June 22, 2017
* [1.0.0-incubating][recipes-1.0] - October 28, 2016
Documentation for releases before joining Apache have been [archived](archive).
@@ -25,4 +26,5 @@ Documentation for releases before joining Apache have been [archived](archive).
[Apache Fluo Recipes]: https://github.com/apache/fluo-recipes
[fluo-1.1]: /docs/fluo/1.1.0-incubating/
[fluo-1.0]: /docs/fluo/1.0.0-incubating/
+[recipes-1.1]: /docs/fluo-recipes/1.1.0-incubating/
[recipes-1.0]: /docs/fluo-recipes/1.0.0-incubating/
--
To stop receiving notification emails like this one, please contact
['"commits@fluo.apache.org" <co...@fluo.apache.org>'].