You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@fluo.apache.org by mw...@apache.org on 2017/10/02 19:53:15 UTC
[fluo-recipes] branch master updated: Removes docs as they were
moved to project website (#144)
This is an automated email from the ASF dual-hosted git repository.
mwalch pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/fluo-recipes.git
The following commit(s) were added to refs/heads/master by this push:
new aa07963 Removes docs as they were moved to project website (#144)
aa07963 is described below
commit aa07963abb0541e3248549024cac07fae215a07c
Author: Mike Walch <mw...@apache.org>
AuthorDate: Mon Oct 2 15:53:13 2017 -0400
Removes docs as they were moved to project website (#144)
---
README.md | 91 +-----------
docs/accumulo-export-queue.md | 117 ----------------
docs/combine-queue.md | 224 ------------------------------
docs/export-queue.md | 316 ------------------------------------------
docs/recording-tx.md | 85 ------------
docs/row-hasher.md | 135 ------------------
docs/serialization.md | 89 ------------
docs/spark.md | 32 -----
docs/table-optimization.md | 78 -----------
docs/testing.md | 27 ----
docs/transient.md | 96 -------------
11 files changed, 7 insertions(+), 1283 deletions(-)
diff --git a/README.md b/README.md
index bd79db0..6a6f121 100644
--- a/README.md
+++ b/README.md
@@ -20,98 +20,21 @@ limitations under the License.
**Fluo Recipes are common code for [Apache Fluo][fluo] application developers.**
-Fluo Recipes build on the [Fluo API][fluo-api] to offer additional functionality to
-developers. They are published separately from Fluo on their own release schedule.
+Fluo Recipes build on the Fluo API to offer additional functionality to developers.
+They are published separately from Fluo on their own release schedule.
This allows Fluo Recipes to iterate and innovate faster than Fluo (which will maintain
a more minimal API on a slower release cycle). Fluo Recipes offers code to implement
common patterns on top of Fluo's API. It also offers glue code to external libraries
like Spark and Kryo.
-### Documentation
+## Getting Started
-Recipes are documented below and in the [Recipes API docs][recipes-api].
-
-* [Combine Queue][combine-q] - A recipe for concurrently updating many keys while avoiding
- collisions.
-* [Export Queue][export-q] - A recipe for exporting data from Fluo to external systems.
-* [Row Hash Prefix][row-hasher] - A recipe for spreading data evenly in a row prefix.
-* [RecordingTransaction][recording-tx] - A wrapper for a Fluo transaction that records all transaction
-operations to a log which can be used to export data from Fluo.
-* [Testing][testing] Some code to help write Fluo Integration test.
-
-Recipes have common needs that are broken down into the following reusable components.
-
-* [Serialization][serialization] - Common code for serializing POJOs.
-* [Transient Ranges][transient] - Standardized process for dealing with transient data.
-* [Table optimization][optimization] - Standardized process for optimizing the Fluo table.
-* [Spark integration][spark] - Spark+Fluo integration code.
-
-### Usage
-
-The Fluo Recipes project publishes multiple jars to Maven Central for each release.
-The `fluo-recipes-core` jar is the primary jar. It is where most recipes live and where
-they are placed by default if they have minimal dependencies beyond the Fluo API.
-
-Fluo Recipes with dependencies that bring in many transitive dependencies publish
-their own jar. For example, recipes that depend on Apache Spark are published in the
-`fluo-recipes-spark` jar. If you don't plan on using code in the `fluo-recipes-spark`
-jar, you should avoid including it in your pom.xml to avoid a transitive dependency on
-Spark.
-
-Below is a sample Maven POM containing all possible Fluo Recipes dependencies:
-
-```xml
- <properties>
- <fluo-recipes.version>1.1.0-incubating</fluo-recipes.version>
- </properties>
-
- <dependencies>
- <!-- Required. Contains recipes that are only depend on the Fluo API -->
- <dependency>
- <groupId>org.apache.fluo</groupId>
- <artifactId>fluo-recipes-core</artifactId>
- <version>${fluo-recipes.version}</version>
- </dependency>
- <!-- Optional. Serialization code that depends on Kryo -->
- <dependency>
- <groupId>org.apache.fluo</groupId>
- <artifactId>fluo-recipes-kryo</artifactId>
- <version>${fluo-recipes.version}</version>
- </dependency>
- <!-- Optional. Common code for using Fluo with Accumulo -->
- <dependency>
- <groupId>org.apache.fluo</groupId>
- <artifactId>fluo-recipes-accumulo</artifactId>
- <version>${fluo-recipes.version}</version>
- </dependency>
- <!-- Optional. Common code for using Fluo with Spark -->
- <dependency>
- <groupId>org.apache.fluo</groupId>
- <artifactId>fluo-recipes-spark</artifactId>
- <version>${fluo-recipes.version}</version>
- </dependency>
- <!-- Optional. Common code for writing Fluo integration tests -->
- <dependency>
- <groupId>org.apache.fluo</groupId>
- <artifactId>fluo-recipes-test</artifactId>
- <version>${fluo-recipes.version}</version>
- <scope>test</scope>
- </dependency>
- </dependencies>
-```
+* Read the [Fluo Recipes documentation][fluo-docs]
+* View the [API][fluo-api] for Fluo Recipes
[fluo]: https://fluo.apache.org/
-[fluo-api]: https://fluo.apache.org/apidocs/fluo/
-[recipes-api]: https://fluo.apache.org/apidocs/fluo-recipes/
-[combine-q]: docs/combine-queue.md
-[export-q]: docs/export-queue.md
-[recording-tx]: docs/recording-tx.md
-[serialization]: docs/serialization.md
-[transient]: docs/transient.md
-[optimization]: docs/table-optimization.md
-[row-hasher]: docs/row-hasher.md
-[spark]: docs/spark.md
-[testing]: docs/testing.md
+[fluo-api]: https://fluo.apache.org/api/
+[fluo-docs]: https://fluo.apache.org/docs/
[ti]: https://travis-ci.org/apache/fluo-recipes.svg?branch=master
[tl]: https://travis-ci.org/apache/fluo-recipes
[li]: http://img.shields.io/badge/license-ASL-blue.svg
diff --git a/docs/accumulo-export-queue.md b/docs/accumulo-export-queue.md
deleted file mode 100644
index f9de6c6..0000000
--- a/docs/accumulo-export-queue.md
+++ /dev/null
@@ -1,117 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Accumulo Export Queue Specialization
-
-## Background
-
-The [Export Queue Recipe][1] provides a generic foundation for building export mechanism to any
-external data store. The [AccumuloExporter] provides an [Exporter] for writing to
-Accumulo. [AccumuloExporter] is located in the `fluo-recipes-accumulo` module and provides the
-following functionality:
-
- * Safely batches writes to Accumulo made by multiple transactions exporting data.
- * Stores Accumulo connection information in Fluo configuration, making it accessible by Export
- Observers running on other nodes.
- * Provides utility code that make it easier and shorter to code common Accumulo export patterns.
-
-## Example Use
-
-Exporting to Accumulo is easy. Follow the steps below:
-
-1. First, implement [AccumuloTranslator]. Your implementation translates exported
- objects to Accumulo Mutations. For example, the `SimpleTranslator` class below translates String
- key/values and into mutations for Accumulo. This step is optional, a lambda could
- be used in step 3 instead of creating a class.
-
- ```java
- public class SimpleTranslator implements AccumuloTranslator<String,String> {
-
- @Override
- public void translate(SequencedExport<String, String> export, Consumer<Mutation> consumer) {
- Mutation m = new Mutation(export.getKey());
- m.put("cf", "cq", export.getSequence(), export.getValue());
- consumer.accept(m);
- }
- }
-
- ```
-
-2. Configure an `ExportQueue` and the export table prior to initializing Fluo.
-
- ```java
-
- FluoConfiguration fluoConfig = ...;
-
- String instance = // Name of accumulo instance exporting to
- String zookeepers = // Zookeepers used by Accumulo instance exporting to
- String user = // Accumulo username, user that can write to exportTable
- String password = // Accumulo user password
- String exportTable = // Name of table to export to
-
- // Set properties for table to export to in Fluo app configuration.
- AccumuloExporter.configure(EXPORT_QID).instance(instance, zookeepers)
- .credentials(user, password).table(exportTable).save(fluoConfig);
-
- // Set properties for export queue in Fluo app configuration
- ExportQueue.configure(EXPORT_QID).keyType(String.class).valueType(String.class)
- .buckets(119).save(fluoConfig);
-
- // Initialize Fluo using fluoConfig
- ```
-
-3. In the applications `ObserverProvider`, register an observer that will process exports and write
- them to Accumulo using [AccumuloExporter]. Also, register observers that add to the export
- queue.
-
- ```java
- public class MyObserverProvider implements ObserverProvider {
-
- @Override
- public void provide(Registry obsRegistry, Context ctx) {
- SimpleConfiguration appCfg = ctx.getAppConfiguration();
-
- ExportQueue<String, String> expQ = ExportQueue.getInstance(EXPORT_QID, appCfg);
-
- // Register observer that will processes entries on export queue and write them to the Accumulo
- // table configured earlier. SimpleTranslator from step 1 is passed here, could have used a
- // lambda instead.
- expQ.registerObserver(obsRegistry,
- new AccumuloExporter<>(EXPORT_QID, appCfg, new SimpleTranslator()));
-
- // An example observer created using a lambda that adds to the export queue.
- obsRegistry.forColumn(OBS_COL, WEAK).useObserver((tx,row,col) -> {
- // Read some data and do some work
-
- // Add results to export queue
- String key = // key that identifies export
- String value = // object to export
- expQ.add(tx, key, value);
- });
- }
- }
- ```
-
-## Other use cases
-
-The `getTranslator()` method in [AccumuloReplicator] creates a specialized [AccumuloTranslator] for replicating a Fluo table to Accumulo.
-
-[1]: export-queue.md
-[Exporter]: ../modules/core/src/main/java/org/apache/fluo/recipes/core/export/function/Exporter.java
-[AccumuloExporter]: ../modules/accumulo/src/main/java/org/apache/fluo/recipes/accumulo/export/function/AccumuloExporter.java
-[AccumuloTranslator]: ../modules/accumulo/src/main/java/org/apache/fluo/recipes/accumulo/export/function/AccumuloTranslator.java
-[AccumuloReplicator]: ../modules/accumulo/src/main/java/org/apache/fluo/recipes/accumulo/export/AccumuloReplicator.java
-
diff --git a/docs/combine-queue.md b/docs/combine-queue.md
deleted file mode 100644
index 0cce71b..0000000
--- a/docs/combine-queue.md
+++ /dev/null
@@ -1,224 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Combine Queue Recipe
-
-## Background
-
-When many transactions try to modify the same keys, collisions will occur. Too many collisions
-cause transactions to fail and throughput to nose dive. For example, consider [phrasecount]
-which has many transactions processing documents. Each transaction counts the phrases in a document
-and then updates global phrase counts. Since transaction attempts to update many phrases
-, the probability of collisions is high.
-
-## Solution
-
-The [combine queue recipe][CombineQueue] provides a reusable solution for updating many keys while
-avoiding collisions. The recipe also organizes updates into batches in order to improve throughput.
-
-This recipes queues updates to keys for other transactions to process. In the phrase count example
-transactions processing documents queue updates, but do not actually update the counts. Below is an
-example of computing phrasecounts using this recipe.
-
- * TX1 queues `+1` update for phrase `we want lambdas now`
- * TX2 queues `+1` update for phrase `we want lambdas now`
- * TX3 reads the updates and current value for the phrase `we want lambdas now`. There is no current value and the updates sum to 2, so a new value of 2 is written.
- * TX4 queues `+2` update for phrase `we want lambdas now`
- * TX5 queues `-1` update for phrase `we want lambdas now`
- * TX6 reads the updates and current value for the phrase `we want lambdas now`. The current value is 2 and the updates sum to 1, so a new value of 3 is written.
-
-Transactions processing updates have the ability to make additional updates.
-For example in addition to updating the current value for a phrase, the new
-value could also be placed on an export queue to update an external database.
-
-### Buckets
-
-A simple implementation of this recipe would have an update queue for each key. However the
-implementation is slightly more complex. Each update queue is in a bucket and transactions process
-all of the updates in a bucket. This allows more efficient processing of updates for the following
-reasons :
-
- * When updates are queued, notifications are made per bucket(instead of per a key).
- * The transaction doing the update can scan the entire bucket reading updates, this avoids a seek for each key being updated.
- * Also the transaction can request a batch lookup to get the current value of all the keys being updated.
- * Any additional actions taken on update (like adding something to an export queue) can also be batched.
- * Data is organized to make reading exiting values for keys in a bucket more efficient.
-
-Which bucket a key goes to is decided using hash and modulus so that multiple updates for a key go
-to the same bucket.
-
-The initial number of tablets to create when applying table optimizations can be controlled by
-setting the buckets per tablet option when configuring a Combine Queue. For example if you
-have 20 tablet servers and 1000 buckets and want 2 tablets per tserver initially then set buckets
-per tablet to 1000/(2*20)=25.
-
-## Example Use
-
-The following code snippets show how to use this recipe for wordcount. The first step is to
-configure it before initializing Fluo. When initializing an ID is needed. This ID is used in two
-ways. First, the ID is used as a row prefix in the table. Therefore nothing else should use that
-row range in the table. Second, the ID is used in generating configuration keys.
-
-The following snippet shows how to configure a combine queue.
-
-```java
- FluoConfiguration fluoConfig = ...;
-
- // Set application properties for the combine queue. These properties are read later by
- // the observers running on each worker.
- CombineQueue.configure(WcObserverProvider.ID)
- .keyType(String.class).valueType(Long.class).buckets(119).save(fluoConfig);
-
- fluoConfig.setObserverProvider(WcObserverProvider.class);
-
- // initialize Fluo using fluoConfig
-```
-
-Assume the following observer is triggered when a documents is updated. It examines new
-and old document content and determines changes in word counts. These changes are pushed to a
-combine queue.
-
-```java
-public class DocumentObserver implements StringObserver {
- // word count combine queue
- private CombineQueue<String, Long> wccq;
-
- public static final Column NEW_COL = new Column("content", "new");
- public static final Column CUR_COL = new Column("content", "current");
-
- public DocumentObserver(CombineQueue<String, Long> wccq) {
- this.wccq = wccq;
- }
-
- @Override
- public void process(TransactionBase tx, String row, Column col) {
-
- Preconditions.checkArgument(col.equals(NEW_COL));
-
- String newContent = tx.gets(row, NEW_COL);
- String currentContent = tx.gets(row, CUR_COL, "");
-
- Map<String, Long> newWordCounts = getWordCounts(newContent);
- Map<String, Long> currentWordCounts = getWordCounts(currentContent);
-
- // determine changes in word counts between old and new document content
- Map<String, Long> changes = calculateChanges(newWordCounts, currentWordCounts);
-
- // queue updates to word counts for processing by other transactions
- wccq.add(tx, changes);
-
- // update the current content and delete the new content
- tx.set(row, CUR_COL, newContent);
- tx.delete(row, NEW_COL);
- }
-
- private static Map<String, Long> getWordCounts(String doc) {
- // TODO extract words from doc
- }
-
- private static Map<String, Long> calculateChanges(Map<String, Long> newCounts,
- Map<String, Long> currCounts) {
- Map<String, Long> changes = new HashMap<>();
-
- // guava Maps class
- MapDifference<String, Long> diffs = Maps.difference(currCounts, newCounts);
-
- // compute the diffs for words that changed
- changes.putAll(Maps.transformValues(diffs.entriesDiffering(),
- vDiff -> vDiff.rightValue() - vDiff.leftValue()));
-
- // add all new words
- changes.putAll(diffs.entriesOnlyOnRight());
-
- // subtract all words no longer present
- changes.putAll(Maps.transformValues(diffs.entriesOnlyOnLeft(), l -> l * -1));
-
- return changes;
- }
-}
-
-
-```
-
-Each combine queue has two extension points, a [combiner][Combiner] and a [change
-observer][ChangeObserver]. The combine queue configures a Fluo observer to process queued
-updates. When processing updates the two extension points are called. The code below shows
-how to use these extension points.
-
-A change observer can do additional processing when a batch of key values are updated. Below
-updates are queued for export to an external database. The export is given the new and old value
-allowing it to delete the old value if needed.
-
-```java
-public class WcObserverProvider implements ObserverProvider {
-
- public static final String ID = "wc";
-
- @Override
- public void provide(Registry obsRegistry, Context ctx) {
-
- ExportQueue<String, MyDatabaseExport> exportQ = createExportQueue(ctx);
-
- // Create a combine queue for computing word counts.
- CombineQueue<String, Long> wcMap = CombineQueue.getInstance(ID, ctx.getAppConfiguration());
-
- // Register observer that updates the Combine Queue
- obsRegistry.forColumn(DocumentObserver.NEW_COL, STRONG).useObserver(new DocumentObserver(wcMap));
-
- // Used to join new and existing values for a key. The lambda sums all values and returns
- // Optional.empty() when the sum is zero. Returning Optional.empty() causes the key/value to be
- // deleted. Could have used the built in SummingCombiner.
- Combiner<String, Long> combiner = input -> input.stream().reduce(Long::sum).filter(l -> l != 0);
-
- // Called when the value of a key changes. The lambda exports these changes to an external
- // database. Make sure to read ChangeObserver's javadoc.
- ChangeObserver<String, Long> changeObs = (tx, changes) -> {
- for (Change<String, Long> update : changes) {
- String word = update.getKey();
- Optional<Long> oldVal = update.getOldValue();
- Optional<Long> newVal = update.getNewValue();
-
- // Queue an export to let an external database know the word count has changed.
- exportQ.add(tx, word, new MyDatabaseExport(oldVal, newVal));
- }
- };
-
- // Register observer that handles updates to the CombineQueue. This observer will use the
- // combiner and valueObserver.
- wcMap.registerObserver(obsRegistry, combiner, changeObs);
- }
-}
-```
-
-## Guarantees
-
-This recipe makes two important guarantees about updates for a key when it
-calls `process()` on a [ChangeObserver].
-
- * The new value reported for an update will be derived from combining all
- updates that were committed before the transaction thats processing updates
- started. The implementation may have to make multiple passes over queued
- updates to achieve this. In the situation where TX1 queues a `+1` and later
- TX2 queues a `-1` for the same key, there is no need to worry about only seeing
- the `-1` processed. A transaction that started processing updates after TX2
- committed would process both.
- * The old value will always be what was reported as the new value in the
- previous transaction that called `ChangeObserver.process()`.
-
-[phrasecount]: https://github.com/fluo-io/phrasecount
-[CombineQueue]: /modules/core/src/main/java/org/apache/fluo/recipes/core/combine/CombineQueue.java
-[ChangeObserver]: /modules/core/src/main/java/org/apache/fluo/recipes/core/combine/ChangeObserver.java
-[Combiner]: /modules/core/src/main/java/org/apache/fluo/recipes/core/combine/Combiner.java
diff --git a/docs/export-queue.md b/docs/export-queue.md
deleted file mode 100644
index ec4b890..0000000
--- a/docs/export-queue.md
+++ /dev/null
@@ -1,316 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Export Queue Recipe
-
-## Background
-
-Fluo is not suited for servicing low latency queries for two reasons. First, the implementation of
-transactions are designed for throughput. To get throughput, transactions recover lazily from
-failures and may wait on other transactions. Both of these design decisions can
-lead to delays of individual transactions, but do not negatively impact throughput. The second
-reason is that Fluo observers executing transactions will likely cause a large number of random
-accesses. This could lead to high response time variability for an individual random access. This
-variability would not impede throughput but would impede the goal of low latency.
-
-One way to make data transformed by Fluo available for low latency queries is
-to export that data to another system. For example Fluo could run on
-cluster A, continually transforming a large data set, and exporting data to
-Accumulo tables on cluster B. The tables on cluster B would service user
-queries. Fluo Recipes has built in support for [exporting to Accumulo][3],
-however this recipe can be used to export to systems other than Accumulo, like
-Redis, Elasticsearch, MySQL, etc.
-
-Exporting data from Fluo is easy to get wrong which is why this recipe exists.
-To understand what can go wrong consider the following example observer
-transaction.
-
-```java
-public class MyObserver implements StringObserver {
-
- static final Column UPDATE_COL = new Column("meta", "numUpdates");
- static final Column COUNTER_COL = new Column("meta", "counter1");
-
- //reperesents a Query system extrnal to Fluo that is updated by Fluo
- QuerySystem querySystem;
-
- @Override
- public void process(TransactionBase tx, String row, Column col) {
-
- int oldCount = Integer.parseInt(tx.gets(row, COUNTER_COL, "0"));
- int numUpdates = Integer.parseInt(tx.gets(row, UPDATE_COL, "0"));
- int newCount = oldCount + numUpdates;
-
- tx.set(row, COUNTER_COL, "" + newCount);
- tx.delete(row, UPDATE_COL);
-
- //Build an inverted index in the query system, based on count from the
- //meta:counter1 column in fluo. Do this by creating rows for the
- //external query system based on the count.
- String oldCountRow = String.format("%06d", oldCount);
- String newCountRow = String.format("%06d", newCount);
-
- //add a new entry to the inverted index
- querySystem.insertRow(newCountRow, row);
- //remove the old entry from the inverted index
- querySystem.deleteRow(oldCountRow, row);
- }
-}
-```
-
-The above example would keep the external index up to date beautifully as long
-as the following conditions are met.
-
- * Threads executing transactions always complete successfully.
- * Only a single thread ever responds to a notification.
-
-However these conditions are not guaranteed by Fluo. Multiple threads may
-attempt to process a notification concurrently (only one may succeed). Also at
-any point in time a transaction may fail (for example the computer executing it
-may reboot). Both of these problems will occur and will lead to corruption of
-the external index in the example. The inverted index and Fluo will become
-inconsistent. The inverted index will end up with multiple entries (that are
-never cleaned up) for single entity even though the intent is to only have one.
-
-The root of the problem in the example above is that its exporting uncommitted
-data. There is no guarantee that setting the column `<row>:meta:counter1` to
-`newCount` will succeed until the transaction is successfully committed.
-However, `newCountRow` is derived from `newCount` and written to the external query
-system before the transaction is committed (Note : for observers, the
-transaction is committed by the framework after `process(...)` is called). So
-if the transaction fails, the next time it runs it could compute a completely
-different value for `newCountRow` (and it would not delete what was written by the
-failed transaction).
-
-## Solution
-
-The simple solution to the problem of exporting uncommitted data is to only
-export committed data. There are multiple ways to accomplish this. This
-recipe offers a reusable implementation of one method. This recipe has the
-following elements:
-
- * An export queue that transactions can add key/values to. Only if the transaction commits successfully will the key/value end up in the queue. A Fluo application can have multiple export queues, each one must have a unique id.
- * When a key/value is added to the export queue, its given a sequence number. This sequence number is based on the transactions start timestamp.
- * Each export queue is configured with an observer that processes key/values that were successfully committed to the queue.
- * When key/values in an export queue are processed, they are deleted so the export queue does not keep any long term data.
- * Key/values in an export queue are placed in buckets. This is done so that all of the updates in a bucket can be processed in a single transaction. This allows an efficient implementation of this recipe in Fluo. It can also lead to efficiency in a system being exported to, if the system can benefit from batching updates. The number of buckets in an export queue is configurable.
-
-There are three requirements for using this recipe :
-
- * Must configure export queues before initializing a Fluo application.
- * Transactions adding to an export queue must get an instance of the queue using its unique QID.
- * Must create a class or lambda that implements [Exporter][1] in order to process exports.
-
-## Example Use
-
-This example shows how to incrementally build an inverted index in an external query system using an
-export queue. The class below is simple POJO used as the value for the export queue.
-
-```java
-class CountUpdate {
- public int oldCount;
- public int newCount;
-
- public CountUpdate(int oc, int nc) {
- this.oldCount = oc;
- this.newCount = nc;
- }
-}
-```
-
-The following code shows how to configure an export queue. This code
-modifies the FluoConfiguration object with options needed for the export queue.
-This FluoConfiguration object should be used to initialize the fluo
-application.
-
-```java
-public class FluoApp {
-
- // export queue id "ici" means inverted count index
- public static final String EQ_ID = "ici";
-
- static final Column UPDATE_COL = new Column("meta", "numUpdates");
- static final Column COUNTER_COL = new Column("meta", "counter1");
-
- public static class AppObserverProvider implements ObserverProvider {
- @Override
- public void provide(Registry obsRegistry, Context ctx) {
- ExportQueue<String, CountUpdate> expQ =
- ExportQueue.getInstance(EQ_ID, ctx.getAppConfiguration());
-
- // register observer that will queue data to export
- obsRegistry.forColumn(UPDATE_COL, STRONG).useObserver(new MyObserver(expQ));
-
- // register observer that will export queued data
- expQ.registerObserver(obsRegistry, new CountExporter());
- }
- }
-
- /**
- * Call this method before initializing Fluo.
- *
- * @param fluoConfig the configuration object that will be used to initialize Fluo
- */
- public static void preInit(FluoConfiguration fluoConfig) {
-
- // Set properties for export queue in Fluo app configuration
- ExportQueue.configure(QUEUE_ID)
- .keyType(String.class)
- .valueType(CountUpdate.class)
- .buckets(1009)
- .bucketsPerTablet(10)
- .save(getFluoConfiguration());
-
- fluoConfig.setObserverProvider(AppObserverProvider.class);
- }
-}
-```
-
-Below is updated version of the observer from above thats now using an export
-queue.
-
-```java
-public class MyObserver implements StringObserver {
-
- private ExportQueue<String, CountUpdate> exportQueue;
-
- public MyObserver(ExportQueue<String, CountUpdate> exportQueue) {
- this.exportQueue = exportQueue;
- }
-
- @Override
- public void process(TransactionBase tx, String row, Column col) {
-
- int oldCount = Integer.parseInt(tx.gets(row, FluoApp.COUNTER_COL, "0"));
- int numUpdates = Integer.parseInt(tx.gets(row, FluoApp.UPDATE_COL, "0"));
- int newCount = oldCount + numUpdates;
-
- tx.set(row, FluoApp.COUNTER_COL, "" + newCount);
- tx.delete(row, FluoApp.UPDATE_COL);
-
- // Because the update to the export queue is part of the transaction,
- // either the update to meta:counter1 is made and an entry is added to
- // the export queue or neither happens.
- exportQueue.add(tx, row, new CountUpdate(oldCount, newCount));
- }
-}
-```
-
-The export queue will call the `accept()` method on the class below to process entries queued for
-export. It is possible the call to `accept()` can fail part way through and/or be called multiple
-times. In the case of failures the export consumer will be called again with the same data.
-Its possible for the same export entry to be processed on multiple computers at different times.
-This can cause exports to arrive out of order. The purpose of the sequence number is to help
-systems receiving out of order and redundant data.
-
-```java
-public class CountExporter implements Exporter<String, CountUpdate> {
- // represents the external query system we want to update from Fluo
- QuerySystem querySystem;
-
- @Override
- public void export(Iterator<SequencedExport<String, CountUpdate>> exports) {
- BatchUpdater batchUpdater = querySystem.getBatchUpdater();
-
- while (exports.hasNext()) {
- SequencedExport<String, CountUpdate> export = exports.next();
- String row = export.getKey();
- CountUpdate uc = export.getValue();
- long seqNum = export.getSequence();
-
- String oldCountRow = String.format("%06d", uc.oldCount);
- String newCountRow = String.format("%06d", uc.newCount);
-
- // add a new entry to the inverted index
- batchUpdater.insertRow(newCountRow, row, seqNum);
- // remove the old entry from the inverted index
- batchUpdater.deleteRow(oldCountRow, row, seqNum);
- }
-
- // flush all of the updates to the external query system
- batchUpdater.close();
- }
-}
-```
-
-## Schema
-
-Each export queue stores its data in the Fluo table in a contiguous row range.
-This row range is defined by using the export queue id as a row prefix for all
-data in the export queue. So the row range defined by the export queue id
-should not be used by anything else.
-
-All data stored in an export queue is [transient](transient.md). When an export
-queue is configured, it will recommend split points using the [table
-optimization process](table-optimization.md). The number of splits generated
-by this process can be controlled by setting the number of buckets per tablet
-when configuring an export queue.
-
-## Concurrency
-
-Additions to the export queue will never collide. If two transactions add the
-same key at around the same time and successfully commit, then two entries with
-different sequence numbers will always be added to the queue. The sequence
-number is based on the start timestamp of the transactions.
-
-If the key used to add items to the export queue is deterministically derived
-from something the transaction is writing to, then that will cause a collision.
-For example consider the following interleaving of two transactions adding to
-the same export queue in a manner that will collide. Note, TH1 is shorthand for
-thread 1, ek() is a function the creates the export key, and ev() is a function
-that creates the export value.
-
- 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
- 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
- 1. TH1 : exportQueueA.add(tx1, key1, val1)
- 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
- 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
- 1. TH2 : exportQueueA.add(tx2, key2, val2)
- 1. TH1 : tx1.set(`row1`,`fam1:qual1`, val1)
- 1. TH2 : tx2.set(`row1`,`fam1:qual1`, val2)
-
-In the example above only one transaction will succeed because both are setting
-`row1 fam1:qual1`. Since adding to the export queue is part of the
-transaction, only the transaction that succeeds will add something to the
-queue. If the funtion ek() in the example is deterministic, then both
-transactions would have been trying to add the same key to the export queue.
-
-With the above method, we know that transactions adding entries to the queue for
-the same key must have executed [serially][2]. Knowing that transactions which
-added the same key did not overlap in time makes reasoning about those export
-entries very simple.
-
-The example below is a slight modification of the example above. In this
-example both transactions will successfully add entries to the queue using the
-same key. Both transactions succeed because they are writing to different
-cells (`rowB fam1:qual2` and `rowA fam1:qual2`). This approach makes it more
-difficult to reason about export entries with the same key, because the
-transactions adding those entries could have overlapped in time. This is an
-example of write skew mentioned in the Percolater paper.
-
- 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
- 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
- 1. TH1 : exportQueueA.add(tx1, key1, val1)
- 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
- 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
- 1. TH2 : exportQueueA.add(tx2, key2, val2)
- 1. TH1 : tx1.set(`rowA`,`fam1:qual2`, val1)
- 1. TH2 : tx2.set(`rowB`,`fam1:qual2`, val2)
-
-[1]: ../modules/core/src/main/java/org/apache/fluo/recipes/core/export/function/Exporter.java
-[2]: https://en.wikipedia.org/wiki/Serializability
-[3]: accumulo-export-queue.md
-
diff --git a/docs/recording-tx.md b/docs/recording-tx.md
deleted file mode 100644
index 7a6fb8e..0000000
--- a/docs/recording-tx.md
+++ /dev/null
@@ -1,85 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# RecordingTransaction recipe
-
-A `RecordingTransaction` is an implementation of `Transaction` that logs all transaction operations
-(i.e GET, SET, or DELETE) to a `TxLog` object for later uses such as exporting data. The code below
-shows how a RecordingTransaction is created by wrapping a Transaction object:
-
-```java
-RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
-```
-
-A predicate function can be passed to wrap method to select which log entries to record. The code
-below only records log entries whose column family is `meta`:
-
-```java
-RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx,
- le -> le.getColumn().getFamily().toString().equals("meta"));
-```
-
-After creating a RecordingTransaction, users can use it as they would use a Transaction object.
-
-```java
-Bytes value = rtx.get(Bytes.of("r1"), new Column("cf1", "cq1"));
-```
-
-While SET or DELETE operations are always recorded to the log, GET operations are only recorded if a
-value was found at the requested row/column. Also, if a GET method returns an iterator, only the GET
-operations that are retrieved from the iterator are logged. GET operations are logged as they are
-necessary if you want to determine the changes made by the transaction.
-
-When you are done operating on the transaction, you can retrieve the TxLog using the following code:
-
-```java
-TxLog myTxLog = rtx.getTxLog()
-```
-
-Below is example code of how a RecordingTransaction can be used in an observer to record all operations
-performed by the transaction in a TxLog. In this example, a GET (if data exists) and SET operation
-will be logged. This TxLog can be added to an export queue and later used to export updates from
-Fluo.
-
-```java
-public class MyObserver extends AbstractObserver {
-
- private static final TYPEL = new TypeLayer(new StringEncoder());
-
- private ExportQueue<Bytes, TxLog> exportQueue;
-
- @Override
- public void process(TransactionBase tx, Bytes row, Column col) {
-
- // create recording transaction (rtx)
- RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
-
- // use rtx to create a typed transaction & perform operations
- TypedTransactionBase ttx = TYPEL.wrap(rtx);
- int count = ttx.get().row(row).fam("meta").qual("counter1").toInteger(0);
- ttx.mutate().row(row).fam("meta").qual("counter1").set(count+1);
-
- // when finished performing operations, retrieve transaction log
- TxLog txLog = rtx.getTxLog()
-
- // add txLog to exportQueue if not empty
- if (!txLog.isEmpty()) {
- //do not pass rtx to exportQueue.add()
- exportQueue.add(tx, row, txLog)
- }
- }
-}
-```
diff --git a/docs/row-hasher.md b/docs/row-hasher.md
deleted file mode 100644
index d6d603b..0000000
--- a/docs/row-hasher.md
+++ /dev/null
@@ -1,135 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Row hash prefix recipe
-
-## Background
-
-Transactions are implemented in Fluo using conditional mutations. Conditional
-mutations require server side processing on tservers. If data is not spread
-evenly, it can cause some tservers to execute more conditional mutations than
-others. These tservers doing more work can become a bottleneck. Most real
-world data is not uniform and can cause this problem.
-
-Before the Fluo [Webindex example][1] started using this recipe it suffered
-from this problem. The example was using reverse dns encoded URLs for row keys
-like `p:com.cnn/story1.html`. This made certain portions of the table more
-popular, which in turn made some tservers do much more work. This uneven
-distribution of work lead to lower throughput and uneven performance. Using
-this recipe made those problems go away.
-
-## Solution
-
-This recipe provides code to help add a hash of the row as a prefix of the row.
-Using this recipe rows are structured like the following.
-
-```
-<prefix>:<fixed len row hash>:<user row>
-```
-
-The recipe also provides code to help generate split points and configure
-balancing of the prefix.
-
-## Example Use
-
-```java
-import org.apache.fluo.api.config.FluoConfiguration;
-import org.apache.fluo.api.data.Bytes;
-import org.apache.fluo.recipes.core.data.RowHasher;
-
-public class RowHasherExample {
-
-
- private static final RowHasher PAGE_ROW_HASHER = new RowHasher("p");
-
- // Provide one place to obtain row hasher.
- public static RowHasher getPageRowHasher() {
- return PAGE_ROW_HASHER;
- }
-
- public static void main(String[] args) {
- RowHasher pageRowHasher = getPageRowHasher();
-
- String revUrl = "org.wikipedia/accumulo";
-
- // Add a hash prefix to the row. Use this hashedRow in your transaction
- Bytes hashedRow = pageRowHasher.addHash(revUrl);
- System.out.println("hashedRow : " + hashedRow);
-
- // Remove the prefix. This can be used by transactions dealing with the hashed row.
- Bytes orig = pageRowHasher.removeHash(hashedRow);
- System.out.println("orig : " + orig);
-
-
- // Generate table optimizations for the recipe. This can be called when setting up an
- // application that uses a hashed row.
- int numTablets = 20;
-
- // The following code would normally be called before initializing Fluo. This code
- // registers table optimizations for your prefix+hash.
- FluoConfiguration conf = new FluoConfiguration();
- RowHasher.configure(conf, PAGE_ROW_HASHER.getPrefix(), numTablets);
-
- // Normally you would not call the following code, it would be called automatically for you by
- // TableOperations.optimizeTable(). Calling this code here to show what table optimization will
- // be generated.
- TableOptimizations tableOptimizations = new RowHasher.Optimizer()
- .getTableOptimizations(PAGE_ROW_HASHER.getPrefix(), conf.getAppConfiguration());
- System.out.println("Balance config : " + tableOptimizations.getTabletGroupingRegex());
- System.out.println("Splits : ");
- tableOptimizations.getSplits().forEach(System.out::println);
- System.out.println();
- }
-}
-```
-
-The example program above prints the following.
-
-```
-hashedRow : p:1yl0:org.wikipedia/accumulo
-orig : org.wikipedia/accumulo
-Balance config : (\Qp:\E).*
-Splits :
-p:1sst
-p:3llm
-p:5eef
-p:7778
-p:9001
-p:assu
-p:clln
-p:eeeg
-p:g779
-p:i002
-p:jssv
-p:lllo
-p:neeh
-p:p77a
-p:r003
-p:sssw
-p:ullp
-p:weei
-p:y77b
-p:~
-```
-
-The split points are used to create tablets in the Accumulo table used by Fluo.
-Data and computation will spread very evenly across these tablets. The
-Balancing config will spread the tablets evenly across the tablet servers,
-which will spread the computation evenly. See the [table optimizations][2]
-documentation for information on how to apply the optimizations.
-
-[1]: https://github.com/fluo-io/webindex
-[2]: /docs/table-optimization.md
diff --git a/docs/serialization.md b/docs/serialization.md
deleted file mode 100644
index 25b9d83..0000000
--- a/docs/serialization.md
+++ /dev/null
@@ -1,89 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Serializing Data
-
-Various Fluo Recipes deal with POJOs and need to serialize them. The
-serialization mechanism is configurable and defaults to using [Kryo][1].
-
-## Custom Serialization
-
-In order to use a custom serialization method, two steps need to be taken. The
-first step is to implement [SimpleSerializer][2]. The second step is to
-configure Fluo Recipes to use the custom implementation. This needs to be done
-before initializing Fluo. Below is an example of how to do this.
-
-```java
- FluoConfiguration fluoConfig = ...;
- //assume MySerializer implements SimpleSerializer
- SimpleSerializer.setSetserlializer(fluoConfig, MySerializer.class);
- //initialize Fluo using fluoConfig
-```
-
-## Kryo Factory
-
-If using the default Kryo serializer implementation, then creating a
-KryoFactory implementation can lead to smaller serialization size. When Kryo
-serializes an object graph, it will by default include the fully qualified
-names of the classes in the serialized data. This can be avoided by
-[registering classes][3] that will be serialized. Registration is done by
-creating a KryoFactory and then configuring Fluo Recipes to use it. The
-example below shows how to do this.
-
-For example assume the POJOs named `Node` and `Edge` will be serialized and
-need to be registered with Kryo. This could be done by creating a KryoFactory
-like the following.
-
-```java
-
-package com.foo;
-
-import com.esotericsoftware.kryo.Kryo;
-import com.esotericsoftware.kryo.pool.KryoFactory;
-
-import com.foo.data.Edge;
-import com.foo.data.Node;
-
-public class MyKryoFactory implements KryoFactory {
- @Override
- public Kryo create() {
- Kryo kryo = new Kryo();
-
- //Explicitly assign each class a unique id here to ensure its stable over
- //time and in different environments with different dependencies.
- kryo.register(Node.class, 9);
- kryo.register(Edge.class, 10);
-
- //instruct kryo that these are the only classes we expect to be serialized
- kryo.setRegistrationRequired(true);
-
- return kryo;
- }
-}
-```
-
-Fluo Recipes must be configured to use this factory. The following code shows
-how to do this.
-
-```java
- FluoConfiguration fluoConfig = ...;
- KryoSimplerSerializer.setKryoFactory(fluoConfig, MyKryoFactory.class);
- //initialize Fluo using fluoConfig
-```
-
-[1]: https://github.com/EsotericSoftware/kryo
-[2]: ../modules/core/src/main/java/org/apache/fluo/recipes/core/serialization/SimpleSerializer.java
-[3]: https://github.com/EsotericSoftware/kryo#registration
diff --git a/docs/spark.md b/docs/spark.md
deleted file mode 100644
index 197c189..0000000
--- a/docs/spark.md
+++ /dev/null
@@ -1,32 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Apache Spark helper code
-
-Fluo Recipes has some helper code for [Apache Spark][spark]. Most of the helper code is currently
-related to bulk importing data into Accumulo. This is useful for initializing a new Fluo table with
-historical data via Spark. The Spark helper code is found at
-[modules/spark/src/main/java/org/apache/fluo/recipes/spark/][sdir].
-
-For information on using Spark to load data into Fluo, check out this [blog post][blog].
-
-If you know of other Spark+Fluo integration code that would be useful, then please consider [opening
-an issue](https://github.com/apache/fluo-recipes/issues/new).
-
-[spark]: https://spark.apache.org
-[sdir]: ../modules/spark/src/main/java/org/apache/fluo/recipes/spark/
-[blog]: https://fluo.apache.org/blog/2016/12/22/spark-load/
-
diff --git a/docs/table-optimization.md b/docs/table-optimization.md
deleted file mode 100644
index 7241473..0000000
--- a/docs/table-optimization.md
+++ /dev/null
@@ -1,78 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Fluo Table Optimization
-
-## Background
-
-Recipes may need to make Accumulo specific table modifications for optimal
-performance. Configuring the [Accumulo tablet balancer][3] and adding splits are
-two optimizations that are currently done. Offering a standard way to do these
-optimizations makes it easier to use recipes correctly. These optimizations
-are optional. You could skip them for integration testing, but would probably
-want to use them in production.
-
-## Java Example
-
-```java
-FluoConfiguration fluoConf = ...
-
-//export queue configure method will return table optimizations it would like made
-ExportQueue.configure(fluoConf, ...);
-
-//CollisionFreeMap.configure() will return table optimizations it would like made
-CollisionFreeMap.configure(fluoConf, ...);
-
-//configure optimizations for a prefixed hash range of a table
-RowHasher.configure(fluoConf, ...);
-
-//initialize Fluo
-FluoFactory.newAdmin(fluoConf).initialize(...)
-
-//Automatically optimize the Fluo table for all configured recipes
-TableOperations.optimizeTable(fluoConf);
-```
-
-[TableOperations][2] is provided in the Accumulo module of Fluo Recipes.
-
-## Command Example
-
-Fluo Recipes provides an easy way to optimize a Fluo table for configured
-recipes from the command line. This should be done after configuring reciped
-and initializing Fluo. Below are example command for initializing in this way.
-
-```bash
-
-#create application
-fluo new app1
-
-#configure application
-
-#initialize Fluo
-fluo init app1
-
-#optimize table for all configured recipes
-fluo exec app1 org.apache.fluo.recipes.accumulo.cmds.OptimizeTable
-```
-
-## Table optimization registry
-
-Recipes register themself by calling [TableOptimizations.registerOptimization()][1]. Anyone can use
-this mechanism, its not limited to use by exisitng recipes.
-
-[1]: ../modules/core/src/main/java/org/apache/fluo/recipes/core/common/TableOptimizations.java
-[2]: ../modules/accumulo/src/main/java/org/apache/fluo/recipes/accumulo/ops/TableOperations.java
-[3]: http://accumulo.apache.org/blog/2015/03/20/balancing-groups-of-tablets.html
diff --git a/docs/testing.md b/docs/testing.md
deleted file mode 100644
index 0908857..0000000
--- a/docs/testing.md
+++ /dev/null
@@ -1,27 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Testing
-
-Fluo includes MiniFluo which makes it possible to write an integeration test that
-runs against a real Fluo instance. Fluo Recipes provides the following utility
-code for writing an integration test.
-
- * [FluoITHelper][1] A class with utility methods for comparing expected data with whats in Fluo.
- * [AccumuloExportITBase][2] A base class for writing an integration test that exports data from Fluo to an external Accumulo table.
-
-[1]: ../modules/test/src/main/java/org/apache/fluo/recipes/test/FluoITHelper.java
-[2]: ../modules/test/src/main/java/org/apache/fluo/recipes/test/AccumuloExportITBase.java
diff --git a/docs/transient.md b/docs/transient.md
deleted file mode 100644
index a73a197..0000000
--- a/docs/transient.md
+++ /dev/null
@@ -1,96 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Transient data
-
-## Background
-
-Some recipes store transient data in a portion of the Fluo table. Transient
-data is data thats continually being added and deleted. Also these transient
-data ranges contain no long term data. The way Fluo works, when data is
-deleted a delete marker is inserted but the data is actually still there. Over
-time these transient ranges of the table will have a lot more delete markers
-than actual data if nothing is done. If nothing is done, then processing
-transient data will get increasingly slower over time.
-
-These deleted markers can be cleaned up by forcing Accumulo to compact the
-Fluo table, which will run Fluos garbage collection iterator. However,
-compacting the entire table to clean up these ranges within a table is
-overkill. Alternatively, Accumulo supports compacting ranges of a table. So
-a good solution to the delete marker problem is to periodically compact just
-the transient ranges.
-
-Fluo Recipes provides helper code to deal with transient data ranges in a
-standard way.
-
-## Registering Transient Ranges
-
-Recipes like [Export Queue](export-queue.md) will automatically register
-transient ranges when configured. If you would like to register your own
-transient ranges, use [TransientRegistry][1]. Below is a simple example of
-using this.
-
-```java
-FluoConfiguration fluoConfig = ...;
-TransientRegistry transientRegistry = new TransientRegistry(fluoConfig.getAppConfiguration());
-transientRegistry.addTransientRange(new RowRange(startRow, endRow));
-
-//Initialize Fluo using fluoConfig. This will store the registered ranges in
-//zookeeper making them availiable on any node later.
-```
-
-## Compacting Transient Ranges
-
-Although you may never need to register transient ranges directly, you will
-need to periodically compact transient ranges if using a recipe that registers
-them. Using [TableOperations][2] this can be done with one line of Java code
-like the following.
-
-```java
-FluoConfiguration fluoConfig = ...;
-TableOperations.compactTransient(fluoConfig);
-```
-
-Fluo recipes provides an easy way to compact transient ranges from the command line using the `fluo exec` command as follows:
-
-```
-fluo exec <app name> org.apache.fluo.recipes.accumulo.cmds.CompactTransient [<interval> [<multiplier>]]
-```
-
-If no arguments are specified the command will call `compactTransient()` once.
-If `<interval>` is specified the command will run forever compacting transient
-ranges sleeping `<interval>` seconds between compacting each transient ranges.
-
-In the case where Fluo is backed up in processing data a transient range could
-have a lot of data queued and compacting it too frequently would be
-counterproductive. To avoid this the `CompactTransient` command will consider
-the time it took to compact a range when deciding when to compact that range
-next. This is where the `<multiplier>` argument comes in, the time to sleep
-between compactions of a range is determined as follows. If not specified, the
-multiplier defaults to 3.
-
-```java
- sleepTime = Math.max(compactTime * multiplier, interval);
-```
-
-For example assume a Fluo application has two transient ranges. Also assume
-CompactTransient is run with an interval of 600 and a multiplier of 10. If the
-first range takes 20 seconds to compact, then it will be compacted again in 600
-seconds. If the second range takes 80 seconds to compact, then it will be
-compacted again in 800 seconds.
-
-[1]: ../modules/core/src/main/java/org/apache/fluo/recipes/core/common/TransientRegistry.java
-[2]: ../modules/accumulo/src/main/java/org/apache/fluo/recipes/accumulo/ops/TableOperations.java
--
To stop receiving notification emails like this one, please contact
['"commits@fluo.apache.org" <co...@fluo.apache.org>'].