You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by mw...@apache.org on 2016/12/06 19:14:41 UTC

[02/11] accumulo git commit: ACCUMULO-4532 Improve documentation of examples

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/shard.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/shard.md b/docs/src/main/resources/examples/shard.md
new file mode 100644
index 0000000..5e5789b
--- /dev/null
+++ b/docs/src/main/resources/examples/shard.md
@@ -0,0 +1,68 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Shard Example
+---
+
+Accumulo has an iterator called the intersecting iterator which supports querying a term index that is partitioned by
+document, or "sharded". This example shows how to use the intersecting iterator through these four programs:
+
+ * Index.java - Indexes a set of text files into an Accumulo table
+ * Query.java - Finds documents containing a given set of terms.
+ * Reverse.java - Reads the index table and writes a map of documents to terms into another table.
+ * ContinuousQuery.java  Uses the table populated by Reverse.java to select N random terms per document. Then it continuously and randomly queries those terms.
+
+To run these example programs, create two tables like below.
+
+    username@instance> createtable shard
+    username@instance shard> createtable doc2term
+
+After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code.
+
+    $ cd /local/username/workspace/accumulo/
+    $ find core/src server/src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.simple.shard.Index -i instance -z zookeepers -t shard -u username -p password --partitions 30
+
+The following command queries the index to find all files containing 'foo' and 'bar'.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query -i instance -z zookeepers -t shard -u username -p password foo bar
+    /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java
+    /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java
+    /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java
+    /local/username/workspace/accumulo/src/server/src/main/java/accumulo/test/functional/RowDeleteTest.java
+    /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/logger/TestLogWriter.java
+    /local/username/workspace/accumulo/src/server/src/main/java/accumulo/test/functional/DeleteEverythingTest.java
+    /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java
+    /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/constraints/MetadataConstraintsTest.java
+    /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java
+    /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/util/DefaultMapTest.java
+    /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/tabletserver/InMemoryMapTest.java
+
+In order to run ContinuousQuery, we need to run Reverse.java to populate doc2term.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Reverse -i instance -z zookeepers --shardTable shard --doc2Term doc2term -u username -p password
+
+Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually
+randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.ContinuousQuery -i instance -z zookeepers --shardTable shard --doc2Term doc2term -u username -p password --terms 5
+    [public, core, class, binarycomparable, b] 2  0.081
+    [wordtodelete, unindexdocument, doctablename, putdelete, insert] 1  0.041
+    [import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1  0.049
+    [getpackage, testversion, util, version, 55] 1  0.048
+    [for, static, println, public, the] 55  0.211
+    [sleeptime, wrappingiterator, options, long, utilwaitthread] 1  0.057
+    [string, public, long, 0, wait] 12  0.132

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/tabletofile.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/tabletofile.md b/docs/src/main/resources/examples/tabletofile.md
new file mode 100644
index 0000000..5316b51
--- /dev/null
+++ b/docs/src/main/resources/examples/tabletofile.md
@@ -0,0 +1,61 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+Title: Apache Accumulo Table-to-File Example
+---
+
+This example uses mapreduce to extract specified columns from an existing table.
+
+To run this example you will need some data in a table. The following will
+put a trivial amount of data into accumulo using the accumulo shell:
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    -
+    - type 'help' for a list of available commands
+    -
+    username@instance> createtable input
+    username@instance> insert dog cf cq dogvalue
+    username@instance> insert cat cf cq catvalue
+    username@instance> insert junk family qualifier junkvalue
+    username@instance> quit
+
+The TableToFile class configures a map-only job to read the specified columns and
+write the key/value pairs to a file in HDFS.
+
+The following will extract the rows containing the column "cf:cq":
+
+    $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TableToFile -u user -p passwd -i instance -t input --columns cf:cq --output /tmp/output
+
+    $ hadoop fs -ls /tmp/output
+    -rw-r--r--   1 username supergroup          0 2013-01-10 14:44 /tmp/output/_SUCCESS
+    drwxr-xr-x   - username supergroup          0 2013-01-10 14:44 /tmp/output/_logs
+    drwxr-xr-x   - username supergroup          0 2013-01-10 14:44 /tmp/output/_logs/history
+    -rw-r--r--   1 username supergroup       9049 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_1357847072863_username_TableToFile%5F1357847071434
+    -rw-r--r--   1 username supergroup      26172 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_conf.xml
+    -rw-r--r--   1 username supergroup         50 2013-01-10 14:44 /tmp/output/part-m-00000
+
+We can see the output of our little map-reduce job:
+
+    $ hadoop fs -text /tmp/output/output/part-m-00000
+    catrow cf:cq []	catvalue
+    dogrow cf:cq []	dogvalue
+    $
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/terasort.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/terasort.md b/docs/src/main/resources/examples/terasort.md
new file mode 100644
index 0000000..195bb4a
--- /dev/null
+++ b/docs/src/main/resources/examples/terasort.md
@@ -0,0 +1,52 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Terasort Example
+---
+
+This example uses map/reduce to generate random input data that will
+be sorted by storing it into accumulo. It uses data very similar to the
+hadoop terasort benchmark.
+
+To run this example you run it with arguments describing the amount of data:
+
+    $ ./contrib/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TeraSortIngest \
+    -i instance -z zookeepers -u user -p password \
+    --count 10 \
+    --minKeySize 10 \
+    --maxKeySize 10 \
+    --minValueSize 78 \
+    --maxValueSize 78 \
+    --table sort \
+    --splits 10 \
+
+After the map reduce job completes, scan the data:
+
+    $ ./bin/accumulo shell -u username -p password
+    username@instance> scan -t sort
+    +l-$$OE/ZH c:         4 []    GGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOO
+    ,C)wDw//u= c:        10 []    CCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKK
+    75@~?'WdUF c:         1 []    IIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQ
+    ;L+!2rT~hd c:         8 []    MMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUU
+    LsS8)|.ZLD c:         5 []    OOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWW
+    M^*dDE;6^< c:         9 []    UUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCC
+    ^Eu)<n#kdP c:         3 []    YYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGG
+    le5awB.$sm c:         6 []    WWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEE
+    q__[fwhKFg c:         7 []    EEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMM
+    w[o||:N&H, c:         2 []    QQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYY
+
+Of course, a real benchmark would ingest millions of entries.

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/visibility.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/visibility.md b/docs/src/main/resources/examples/visibility.md
new file mode 100644
index 0000000..8345a9b
--- /dev/null
+++ b/docs/src/main/resources/examples/visibility.md
@@ -0,0 +1,133 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Visibility, Authorizations, and Permissions Example
+---
+
+## Creating a new user
+
+    root@instance> createuser username
+    Enter new password for 'username': ********
+    Please confirm new password for 'username': ********
+    root@instance> user username
+    Enter password for user username: ********
+    username@instance> createtable vistest
+    06 10:48:47,931 [shell.Shell] ERROR: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED - User does not have permission to perform this action
+    username@instance> userpermissions
+    System permissions:
+
+    Table permissions (accumulo.metadata): Table.READ
+    username@instance>
+
+A user does not by default have permission to create a table.
+
+## Granting permissions to a user
+
+    username@instance> user root
+    Enter password for user root: ********
+    root@instance> grant -s System.CREATE_TABLE -u username
+    root@instance> user username
+    Enter password for user username: ********
+    username@instance> createtable vistest
+    username@instance> userpermissions
+    System permissions: System.CREATE_TABLE
+
+    Table permissions (accumulo.metadata): Table.READ
+    Table permissions (vistest): Table.READ, Table.WRITE, Table.BULK_IMPORT, Table.ALTER_TABLE, Table.GRANT, Table.DROP_TABLE
+    username@instance vistest>
+
+## Inserting data with visibilities
+
+Visibilities are boolean AND (&) and OR (|) combinations of authorization
+tokens. Authorization tokens are arbitrary strings taken from a restricted
+ASCII character set. Parentheses are required to specify order of operations
+in visibilities.
+
+    username@instance vistest> insert row f1 q1 v1 -l A
+    username@instance vistest> insert row f2 q2 v2 -l A&B
+    username@instance vistest> insert row f3 q3 v3 -l apple&carrot|broccoli|spinach
+    06 11:19:01,432 [shell.Shell] ERROR: org.apache.accumulo.core.util.BadArgumentException: cannot mix | and & near index 12
+    apple&carrot|broccoli|spinach
+                ^
+    username@instance vistest> insert row f3 q3 v3 -l (apple&carrot)|broccoli|spinach
+    username@instance vistest>
+
+## Scanning with authorizations
+
+Authorizations are sets of authorization tokens. Each Accumulo user has
+authorizations and each Accumulo scan has authorizations. Scan authorizations
+are only allowed to be a subset of the user's authorizations. By default, a
+user's authorizations set is empty.
+
+    username@instance vistest> scan
+    username@instance vistest> scan -s A
+    06 11:43:14,951 [shell.Shell] ERROR: java.lang.RuntimeException: org.apache.accumulo.core.client.AccumuloSecurityException: Error BAD_AUTHORIZATIONS - The user does not have the specified authorizations assigned
+    username@instance vistest>
+
+## Setting authorizations for a user
+
+    username@instance vistest> setauths -s A
+    06 11:53:42,056 [shell.Shell] ERROR: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED - User does not have permission to perform this action
+    username@instance vistest>
+
+A user cannot set authorizations unless the user has the System.ALTER_USER permission.
+The root user has this permission.
+
+    username@instance vistest> user root
+    Enter password for user root: ********
+    root@instance vistest> setauths -s A -u username
+    root@instance vistest> user username
+    Enter password for user username: ********
+    username@instance vistest> scan -s A
+    row f1:q1 [A]    v1
+    username@instance vistest> scan
+    row f1:q1 [A]    v1
+    username@instance vistest>
+
+The default authorizations for a scan are the user's entire set of authorizations.
+
+    username@instance vistest> user root
+    Enter password for user root: ********
+    root@instance vistest> setauths -s A,B,broccoli -u username
+    root@instance vistest> user username
+    Enter password for user username: ********
+    username@instance vistest> scan
+    row f1:q1 [A]    v1
+    row f2:q2 [A&B]    v2
+    row f3:q3 [(apple&carrot)|broccoli|spinach]    v3
+    username@instance vistest> scan -s B
+    username@instance vistest>
+
+If you want, you can limit a user to only be able to insert data which they can read themselves.
+It can be set with the following constraint.
+
+    username@instance vistest> user root
+    Enter password for user root: ******
+    root@instance vistest> config -t vistest -s table.constraint.1=org.apache.accumulo.core.security.VisibilityConstraint
+    root@instance vistest> user username
+    Enter password for user username: ********
+    username@instance vistest> insert row f4 q4 v4 -l spinach
+        Constraint Failures:
+            ConstraintViolationSummary(constrainClass:org.apache.accumulo.core.security.VisibilityConstraint, violationCode:2, violationDescription:User does not have authorization on column visibility, numberOfViolatingMutations:1)
+    username@instance vistest> insert row f4 q4 v4 -l spinach|broccoli
+    username@instance vistest> scan
+    row f1:q1 [A]    v1
+    row f2:q2 [A&B]    v2
+    row f3:q3 [(apple&carrot)|broccoli|spinach]    v3
+    row f4:q4 [spinach|broccoli]    v4
+    username@instance vistest>
+