You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by ec...@apache.org on 2014/06/09 16:54:08 UTC

[3/4] git commit: ACCUMULO-2874 update the docs to match the actual API

ACCUMULO-2874 update the docs to match the actual API


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/63b9685c
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/63b9685c
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/63b9685c

Branch: refs/heads/master
Commit: 63b9685ca7097c7e3fc006f1ae4a5b03c719599b
Parents: 0e097ac 0038df1
Author: Eric C. Newton <er...@gmail.com>
Authored: Mon Jun 9 10:52:06 2014 -0400
Committer: Eric C. Newton <er...@gmail.com>
Committed: Mon Jun 9 10:52:06 2014 -0400

----------------------------------------------------------------------
 docs/src/main/asciidoc/chapters/clients.txt     |  6 ++--
 .../system/continuous/continuous-env.sh.example | 30 ++++++++++++++------
 test/system/continuous/start-agitator.sh        | 12 ++++----
 3 files changed, 30 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/63b9685c/docs/src/main/asciidoc/chapters/clients.txt
----------------------------------------------------------------------
diff --cc docs/src/main/asciidoc/chapters/clients.txt
index 3e0b845,0000000..45f1fe9
mode 100644,000000..100644
--- a/docs/src/main/asciidoc/chapters/clients.txt
+++ b/docs/src/main/asciidoc/chapters/clients.txt
@@@ -1,320 -1,0 +1,320 @@@
 +// Licensed to the Apache Software Foundation (ASF) under one or more
 +// contributor license agreements.  See the NOTICE file distributed with
 +// this work for additional information regarding copyright ownership.
 +// The ASF licenses this file to You under the Apache License, Version 2.0
 +// (the "License"); you may not use this file except in compliance with
 +// the License.  You may obtain a copy of the License at
 +//
 +//     http://www.apache.org/licenses/LICENSE-2.0
 +//
 +// Unless required by applicable law or agreed to in writing, software
 +// distributed under the License is distributed on an "AS IS" BASIS,
 +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 +// See the License for the specific language governing permissions and
 +// limitations under the License.
 +
 +== Writing Accumulo Clients
 +
 +=== Running Client Code
 +
 +There are multiple ways to run Java code that uses Accumulo. Below is a list
 +of the different ways to execute client code.
 +
 +* using java executable
 +* using the accumulo script
 +* using the tool script
 +
 +In order to run client code written to run against Accumulo, you will need to
 +include the jars that Accumulo depends on in your classpath. Accumulo client
 +code depends on Hadoop and Zookeeper. For Hadoop add the hadoop client jar, all
 +of the jars in the Hadoop lib directory, and the conf directory to the
 +classpath. For Zookeeper 3.3 you only need to add the Zookeeper jar, and not
 +what is in the Zookeeper lib directory. You can run the following command on a
 +configured Accumulo system to see what its using for its classpath.
 +
 +  $ACCUMULO_HOME/bin/accumulo classpath
 +
 +Another option for running your code is to put a jar file in
 ++$ACCUMULO_HOME/lib/ext+. After doing this you can use the accumulo
 +script to execute your code. For example if you create a jar containing the
 +class +com.foo.Client+ and placed that in +lib/ext+, then you could use the command
 ++$ACCUMULO_HOME/bin/accumulo com.foo.Client+ to execute your code.
 +
 +If you are writing map reduce job that access Accumulo, then you can use the
 +bin/tool.sh script to run those jobs. See the map reduce example.
 +
 +=== Connecting
 +
 +All clients must first identify the Accumulo instance to which they will be
 +communicating. Code to do this is as follows:
 +
 +[source,java]
 +----
 +String instanceName = "myinstance";
 +String zooServers = "zooserver-one,zooserver-two"
 +Instance inst = new ZooKeeperInstance(instanceName, zooServers);
 +
 +Connector conn = inst.getConnector("user", new PasswordToken("passwd"));
 +----
 +
 +=== Writing Data
 +
 +Data are written to Accumulo by creating Mutation objects that represent all the
 +changes to the columns of a single row. The changes are made atomically in the
 +TabletServer. Clients then add Mutations to a BatchWriter which submits them to
 +the appropriate TabletServers.
 +
 +Mutations can be created thus:
 +
 +[source,java]
 +----
 +Text rowID = new Text("row1");
 +Text colFam = new Text("myColFam");
 +Text colQual = new Text("myColQual");
 +ColumnVisibility colVis = new ColumnVisibility("public");
 +long timestamp = System.currentTimeMillis();
 +
 +Value value = new Value("myValue".getBytes());
 +
 +Mutation mutation = new Mutation(rowID);
 +mutation.put(colFam, colQual, colVis, timestamp, value);
 +----
 +
 +==== BatchWriter
 +The BatchWriter is highly optimized to send Mutations to multiple TabletServers
 +and automatically batches Mutations destined for the same TabletServer to
 +amortize network overhead. Care must be taken to avoid changing the contents of
 +any Object passed to the BatchWriter since it keeps objects in memory while
 +batching.
 +
 +Mutations are added to a BatchWriter thus:
 +
 +[source,java]
 +----
 +// BatchWriterConfig has reasonable defaults
 +BatchWriterConfig config = new BatchWriterConfig();
 +config.setMaxMemory(10000000L); // bytes available to batchwriter for buffering mutations
 +
 +BatchWriter writer = conn.createBatchWriter("table", config)
 +
 +writer.add(mutation);
 +
 +writer.close();
 +----
 +
 +An example of using the batch writer can be found at
 ++accumulo/docs/examples/README.batch+.
 +
 +==== ConditionalWriter
 +The ConditionalWriter enables efficient, atomic read-modify-write operations on
 +rows.  The ConditionalWriter writes special Mutations which have a list of per
 +column conditions that must all be met before the mutation is applied.  The
 +conditions are checked in the tablet server while a row lock is
 +held\footnote{Mutations written by the BatchWriter will not obtain a row
 +lock.}.  The conditions that can be checked for a column are equality and
 +absence.  For example a conditional mutation can require that column A is
 +absent inorder to be applied.  Iterators can be applied when checking
 +conditions.  Using iterators, many other operations besides equality and
 +absence can be checked.  For example, using an iterator that converts values
 +less than 5 to 0 and everything else to 1, its possible to only apply a
 +mutation when a column is less than 5.
 +
 +In the case when a tablet server dies after a client sent a conditional
 +mutation, its not known if the mutation was applied or not.  When this happens
 +the ConditionalWriter reports a status of UNKNOWN for the ConditionalMutation.
 +In many cases this situation can be dealt with by simply reading the row again
 +and possibly sending another conditional mutation.  If this is not sufficient,
 +then a higher level of abstraction can be built by storing transactional
 +information within a row.
 +
 +An example of using the batch writer can be found at
 ++accumulo/docs/examples/README.reservations+.
 +
 +=== Reading Data
 +
 +Accumulo is optimized to quickly retrieve the value associated with a given key, and
 +to efficiently return ranges of consecutive keys and their associated values.
 +
 +==== Scanner
 +
 +To retrieve data, Clients use a Scanner, which acts like an Iterator over
 +keys and values. Scanners can be configured to start and stop at particular keys, and
 +to return a subset of the columns available.
 +
 +[source,java]
 +----
 +// specify which visibilities we are allowed to see
 +Authorizations auths = new Authorizations("public");
 +
 +Scanner scan =
 +    conn.createScanner("table", auths);
 +
 +scan.setRange(new Range("harry","john"));
- scan.fetchFamily("attributes");
++scan.fetchColumnFamily(new Text("attributes"));
 +
 +for(Entry<Key,Value> entry : scan) {
-     String row = entry.getKey().getRow();
++    Text row = entry.getKey().getRow();
 +    Value value = entry.getValue();
 +}
 +----
 +
 +==== Isolated Scanner
 +
 +Accumulo supports the ability to present an isolated view of rows when
 +scanning. There are three possible ways that a row could change in Accumulo :
 +
 +* a mutation applied to a table
 +* iterators executed as part of a minor or major compaction
 +* bulk import of new files
 +
 +Isolation guarantees that either all or none of the changes made by these
 +operations on a row are seen. Use the IsolatedScanner to obtain an isolated
 +view of an Accumulo table. When using the regular scanner it is possible to see
 +a non isolated view of a row. For example if a mutation modifies three
 +columns, it is possible that you will only see two of those modifications.
 +With the isolated scanner either all three of the changes are seen or none.
 +
 +The IsolatedScanner buffers rows on the client side so a large row will not
 +crash a tablet server. By default rows are buffered in memory, but the user
 +can easily supply their own buffer if they wish to buffer to disk when rows are
 +large.
 +
 +For an example, look at the following
 +
 +  examples/simple/src/main/java/org/apache/accumulo/examples/simple/isolation/InterferenceTest.java
 +
 +==== BatchScanner
 +
 +For some types of access, it is more efficient to retrieve several ranges
 +simultaneously. This arises when accessing a set of rows that are not consecutive
 +whose IDs have been retrieved from a secondary index, for example.
 +
 +The BatchScanner is configured similarly to the Scanner; it can be configured to
 +retrieve a subset of the columns available, but rather than passing a single Range,
 +BatchScanners accept a set of Ranges. It is important to note that the keys returned
 +by a BatchScanner are not in sorted order since the keys streamed are from multiple
 +TabletServers in parallel.
 +
 +[source,java]
 +----
 +ArrayList<Range> ranges = new ArrayList<Range>();
 +// populate list of ranges ...
 +
 +BatchScanner bscan =
 +    conn.createBatchScanner("table", auths, 10);
 +bscan.setRanges(ranges);
- bscan.fetchFamily("attributes");
++bscan.fetchColumnFamily("attributes");
 +
 +for(Entry<Key,Value> entry : scan) {
 +    System.out.println(entry.getValue());
 +}
 +----
 +
 +An example of the BatchScanner can be found at
 ++accumulo/docs/examples/README.batch+.
 +
 +=== Proxy
 +
 +The proxy API allows the interaction with Accumulo with languages other than Java.
 +A proxy server is provided in the codebase and a client can further be generated.
 +
 +==== Prequisites
 +
 +The proxy server can live on any node in which the basic client API would work. That
 +means it must be able to communicate with the Master, ZooKeepers, NameNode, and the
 +DataNodes. A proxy client only needs the ability to communicate with the proxy server.
 +
 +
 +==== Configuration
 +
 +The configuration options for the proxy server live inside of a properties file. At
 +the very least, you need to supply the following properties:
 +
 +  protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory
 +  tokenClass=org.apache.accumulo.core.client.security.tokens.PasswordToken
 +  port=42424
 +  instance=test
 +  zookeepers=localhost:2181
 +
 +You can find a sample configuration file in your distribution:
 +
 +  $ACCUMULO_HOME/proxy/proxy.properties.
 +
 +This sample configuration file further demonstrates an ability to back the proxy server
 +by MockAccumulo or the MiniAccumuloCluster.
 +
 +==== Running the Proxy Server
 +
 +After the properties file holding the configuration is created, the proxy server
 +can be started using the following command in the Accumulo distribution (assuming
 +your properties file is named +config.properties+):
 +
 +  $ACCUMULO_HOME/bin/accumulo proxy -p config.properties
 +
 +==== Creating a Proxy Client
 +
 +Aside from installing the Thrift compiler, you will also need the language-specific library
 +for Thrift installed to generate client code in that language. Typically, your operating
 +system's package manager will be able to automatically install these for you in an expected
 +location such as +/usr/lib/python/site-packages/thrift+.
 +
 +You can find the thrift file for generating the client:
 +
 +  $ACCUMULO_HOME/proxy/proxy.thrift.
 +
 +After a client is generated, the port specified in the configuration properties above will be
 +used to connect to the server.
 +
 +==== Using a Proxy Client
 +
 +The following examples have been written in Java and the method signatures may be
 +slightly different depending on the language specified when generating client with
 +the Thrift compiler. After initiating a connection to the Proxy (see Apache Thrift's
 +documentation for examples of connecting to a Thrift service), the methods on the
 +proxy client will be available. The first thing to do is log in:
 +
 +[source,java]
 +Map password = new HashMap<String,String>();
 +password.put("password", "secret");
 +ByteBuffer token = client.login("root", password);
 +
 +Once logged in, the token returned will be used for most subsequent calls to the client.
 +Let's create a table, add some data, scan the table, and delete it.
 +
 +
 +First, create a table.
 +
 +[source,java]
 +client.createTable(token, "myTable", true, TimeType.MILLIS);
 +
 +
 +Next, add some data:
 +
 +[source,java]
 +----
 +// first, create a writer on the server
 +String writer = client.createWriter(token, "myTable", new WriterOptions());
 +
 +// build column updates
 +Map<ByteBuffer, List<ColumnUpdate> cells> cellsToUpdate = //...
 +
 +// send updates to the server
 +client.updateAndFlush(writer, "myTable", cellsToUpdate);
 +
 +client.closeWriter(writer);
 +----
 +
 +
 +Scan for the data and batch the return of the results on the server:
 +
 +[source,java]
 +----
 +String scanner = client.createScanner(token, "myTable", new ScanOptions());
 +ScanResult results = client.nextK(scanner, 100);
 +
 +for(KeyValue keyValue : results.getResultsIterator()) {
 +  // do something with results
 +}
 +
 +client.closeScanner(scanner);
 +----