You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by bi...@apache.org on 2011/10/31 22:40:45 UTC
svn commit: r1195687 [2/2] - in /incubator/accumulo:
branches/1.3/docs/examples/
site/trunk/content/accumulo/user_manual_1.3-incubating/
site/trunk/content/accumulo/user_manual_1.3-incubating/examples/
site/trunk/templates/
Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,57 @@
+Title: File System Archive
+Notice: Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+ .
+ http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+This example shows how to use Accumulo to store a file system history. It has three classes:
+
+ * Ingest.java - Recursively lists the files and directories under a given path, ingests their names and file info (not the file data!) into a Accumulo table, and indexes the file names in a separate table.
+ * QueryUtil.java - Provides utility methods for getting the info for a file, listing the contents of a directory, and performing single wild card searches on file or directory names.
+ * Viewer.java - Provides a GUI for browsing the file system information stored in Accumulo.
+ * FileCountMR.java - Runs MR over the file system information and writes out counts to a Accumulo table.
+ * FileCount.java - Accomplishes the same thing as FileCountMR, but in a different way. Computes recursive counts and stores them back into table.
+ * StringArraySummation.java - Aggregates counts for the FileCountMR reducer.
+
+To begin, ingest some data with Ingest.java.
+
+ $ ./bin/accumulo org.apache.accumulo.examples.dirlist.Ingest instance zookeepers username password direxample dirindex exampleVis /local/user1/workspace
+
+Note that running this example will create tables direxample and dirindex in Accumulo that you should delete when you have completed the example.
+If you modify a file or add new files in the directory ingested (e.g. /local/user1/workspace), you can run Ingest again to add new information into the Accumulo tables.
+
+To browse the data ingested, use Viewer.java. Be sure to give the "username" user the authorizations to see the data.
+
+ $ ./bin/accumulo org.apache.accumulo.examples.dirlist.Viewer instance zookeepers username password direxample exampleVis /local/user1/workspace
+
+To list the contents of specific directories, use QueryUtil.java.
+
+ $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password direxample exampleVis /local/user1
+ $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password direxample exampleVis /local/user1/workspace
+
+To perform searches on file or directory names, also use QueryUtil.java. Search terms must contain no more than one wild card and cannot contain "/".
+Note these queries run on the dirindex table instead of the direxample table.
+
+ $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis filename -search
+ $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis filename* -search
+ $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis *jar -search
+ $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis filename*jar -search
+
+To count the number of direct children (directories and files) and descendants (children and children's descendents, directories and files), run the FileCountMR over the direxample table.
+The results can be written back to the same table.
+
+ $ ./bin/tool.sh lib/accumulo-examples-*.jar org.apache.accumulo.examples.dirlist.FileCountMR instance zookeepers username password direxample direxample exampleVis exampleVis
+
+Alternatively, you can also run FileCount.java.
Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext
------------------------------------------------------------------------------
svn:executable = *
Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,90 @@
+Title: Filter Example
+Notice: Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+ .
+ http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+This is a simple filter example. It uses the AgeOffFilter that is provided as
+part of the core package org.apache.accumulo.core.iterators.filter. Filters are used by
+the FilteringIterator to select desired key/value pairs (or weed out undesired
+ones). Filters implement the org.apache.accumulo.core.iterators.iterators.filter.Filter interface which
+contains a method accept(Key k, Value v). This method returns true if the key,
+value pair are to be delivered and false if they are to be ignored.
+
+ username@instance> createtable filtertest
+ username@instance filtertest> setiter -t filtertest -scan -p 10 -n myfilter -filter
+ FilteringIterator uses Filters to accept or reject key/value pairs
+ ----------> entering options: <filterPriorityNumber> <ageoff|regex|filterClass>
+ ----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 0 ageoff
+ ----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip):
+ AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
+ ----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day:
+ ----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter ttl, time to live (milliseconds): 30000
+ username@instance filtertest>
+
+ username@instance filtertest> scan
+ username@instance filtertest> insert foo a b c
+ insert successful
+ username@instance filtertest> scan
+ foo a:b [] c
+
+... wait 30 seconds ...
+
+ username@instance filtertest> scan
+ username@instance filtertest>
+
+Note the absence of the entry inserted more than 30 seconds ago. Since the
+scope was set to "scan", this means the entry is still in Accumulo, but is
+being filtered out at query time. To delete entries from Accumulo based on
+the ages of their timestamps, AgeOffFilters should be set up for the "minc"
+and "majc" scopes, as well.
+
+To force an ageoff in the persisted data, after setting up the ageoff iterator
+on the "minc" and "majc" scopes you can flush and compact your table. This will
+happen automatically as a background operation on any table that is being
+actively written to, but these are the commands to force compaction:
+
+ username@instance filtertest> flush -t filtertest
+ 08 11:13:55,745 [shell.Shell] INFO : Flush of table filtertest initiated...
+ username@instance filtertest> compact -t filtertest
+ 08 11:14:10,800 [shell.Shell] INFO : Compaction of table filtertest scheduled for 20110208111410EST
+ username@instance filtertest>
+
+After the compaction runs, the newly created files will not contain any data that should be aged off, and the
+Accumulo garbage collector will remove the old files.
+
+To see the iterator settings for a table, use:
+
+ username@instance filtertest> config -t filtertest -f iterator
+ ---------+------------------------------------------+----------------------------------------------------------
+ SCOPE | NAME | VALUE
+ ---------+------------------------------------------+----------------------------------------------------------
+ table | table.iterator.majc.vers................ | 20,org.apache.accumulo.core.iterators.VersioningIterator
+ table | table.iterator.majc.vers.opt.maxVersions | 1
+ table | table.iterator.minc.vers................ | 20,org.apache.accumulo.core.iterators.VersioningIterator
+ table | table.iterator.minc.vers.opt.maxVersions | 1
+ table | table.iterator.scan.myfilter............ | 10,org.apache.accumulo.core.iterators.FilteringIterator
+ table | table.iterator.scan.myfilter.opt.0...... | org.apache.accumulo.core.iterators.filter.AgeOffFilter
+ table | table.iterator.scan.myfilter.opt.0.ttl.. | 30000
+ table | table.iterator.scan.vers................ | 20,org.apache.accumulo.core.iterators.VersioningIterator
+ table | table.iterator.scan.vers.opt.maxVersions | 1
+ ---------+------------------------------------------+----------------------------------------------------------
+ username@instance filtertest>
+
+If you would like to apply multiple filters, this can be done using a single
+iterator. Just continue adding entries during the
+"set org.apache.accumulo.core.iterators.FilteringIterator option" step.
+Make sure to order the filterPriorityNumbers in the order you would like
+the filters to be applied.
Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext
------------------------------------------------------------------------------
svn:executable = *
Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,52 @@
+Title: Hello World Example
+Notice: Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+ .
+ http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.helloworld in the accumulo-examples module:
+
+ * InsertWithBatchWriter.java - Inserts 10K rows (50K entries) into accumulo with each row having 5 entries
+ * InsertWithOutputFormat.java - Example of inserting data in MapReduce
+ * ReadData.java - Reads all data between two rows
+
+Log into the accumulo shell:
+
+ $ ./bin/accumulo shell -u username -p password
+
+Create a table called 'hellotable':
+
+ username@instance> createtable hellotable
+
+Launch a Java program that inserts data with a BatchWriter:
+
+ $ ./bin/accumulo org.apache.accumulo.examples.helloworld.InsertWithBatchWriter instance zookeepers hellotable username password
+
+Alternatively, the same data can be inserted using MapReduce writers:
+
+ $ ./bin/accumulo org.apache.accumulo.examples.helloworld.InsertWithOutputFormat instance zookeepers hellotable username password
+
+On the accumulo status page at the URL below (where 'master' is replaced with the name or IP of your accumulo master), you should see 50K entries
+
+ http://master:50095/
+
+To view the entries, use the shell to scan the table:
+
+ username@instance> table hellotable
+ username@instance hellotable> scan
+
+You can also use a Java class to scan the table:
+
+ $ ./bin/accumulo org.apache.accumulo.examples.helloworld.ReadData instance zookeepers hellotable username password row_0 row_1001
Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext
------------------------------------------------------------------------------
svn:executable = *
Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,85 @@
+Title: MapReduce Example
+Notice: Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+ .
+ http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+This example uses mapreduce and accumulo to compute word counts for a set of
+documents. This is accomplished using a map only map reduce job and a
+accumulo table with aggregators.
+
+To run this example you will need a directory in HDFS containing text files.
+The accumulo readme will be used to show how to run this example.
+
+ $ hadoop fs -copyFromLocal $ACCUMULO_HOME/README /user/username/wc/Accumulo.README
+ $ hadoop fs -ls /user/username/wc
+ Found 1 items
+ -rw-r--r-- 2 username supergroup 9359 2009-07-15 17:54 /user/username/wc/Accumulo.README
+
+The first part of running this example is to create a table with aggregation
+for the column family count.
+
+ $ ./bin/accumulo shell -u username -p password
+ Shell - Accumulo Interactive Shell
+ - version: 1.3.x-incubating
+ - instance name: instance
+ - instance id: 00000000-0000-0000-0000-000000000000
+ -
+ - type 'help' for a list of available commands
+ -
+ username@instance> createtable wordCount -a count=org.apache.accumulo.core.iterators.aggregation.StringSummation
+ username@instance wordCount> quit
+
+After creating the table, run the word count map reduce job.
+
+ [user1@instance accumulo]$ bin/tool.sh lib/accumulo-examples-*.jar org.apache.accumulo.examples.mapreduce.WordCount instance zookeepers /user/user1/wc wordCount -u username -p password
+
+ 11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1
+ 11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003
+ 11/02/07 18:20:13 INFO mapred.JobClient: map 0% reduce 0%
+ 11/02/07 18:20:20 INFO mapred.JobClient: map 100% reduce 0%
+ 11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003
+ 11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6
+ 11/02/07 18:20:22 INFO mapred.JobClient: Job Counters
+ 11/02/07 18:20:22 INFO mapred.JobClient: Launched map tasks=1
+ 11/02/07 18:20:22 INFO mapred.JobClient: Data-local map tasks=1
+ 11/02/07 18:20:22 INFO mapred.JobClient: FileSystemCounters
+ 11/02/07 18:20:22 INFO mapred.JobClient: HDFS_BYTES_READ=10487
+ 11/02/07 18:20:22 INFO mapred.JobClient: Map-Reduce Framework
+ 11/02/07 18:20:22 INFO mapred.JobClient: Map input records=255
+ 11/02/07 18:20:22 INFO mapred.JobClient: Spilled Records=0
+ 11/02/07 18:20:22 INFO mapred.JobClient: Map output records=1452
+
+After the map reduce job completes, query the accumulo table to see word
+counts.
+
+ $ ./bin/accumulo shell -u username -p password
+ username@instance> table wordCount
+ username@instance wordCount> scan -b the
+ the count:20080906 [] 75
+ their count:20080906 [] 2
+ them count:20080906 [] 1
+ then count:20080906 [] 1
+ there count:20080906 [] 1
+ these count:20080906 [] 3
+ this count:20080906 [] 6
+ through count:20080906 [] 1
+ time count:20080906 [] 3
+ time. count:20080906 [] 1
+ to count:20080906 [] 27
+ total count:20080906 [] 1
+ tserver, count:20080906 [] 1
+ tserver.compaction.major.concurrent.max count:20080906 [] 1
+ ...
Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext
------------------------------------------------------------------------------
svn:executable = *
Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,66 @@
+Title: Shard Example
+Notice: Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+ .
+ http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Accumulo has in iterator called the intersecting iterator which supports querying a term index that is partitioned by
+document, or "sharded". This example shows how to use the intersecting iterator through these four programs:
+
+ * Index.java - Indexes a set of text files into a Accumulo table
+ * Query.java - Finds documents containing a given set of terms.
+ * Reverse.java - Reads the index table and writes a map of documents to terms into another table.
+ * ContinuousQuery.java Uses the table populated by Reverse.java to select N random terms per document. Then it continuously and randomly queries those terms.
+
+To run these example programs, create two tables like below.
+
+ username@instance> createtable shard
+ username@instance shard> createtable doc2term
+
+After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code.
+
+ $ cd /local/user1/workspace/accumulo/
+ $ find src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.shard.Index instance zookeepers shard username password 30
+
+The following command queries the index to find all files containing 'foo' and 'bar'.
+
+ $ cd $ACCUMULO_HOME
+ $ ./bin/accumulo org.apache.accumulo.examples.shard.Query instance zookeepers shard username password foo bar
+ /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java
+ /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java
+ /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java
+ /local/user1/workspace/accumulo/src/server/src/main/java/accumulo/server/test/functional/RowDeleteTest.java
+ /local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/logger/TestLogWriter.java
+ /local/user1/workspace/accumulo/src/server/src/main/java/accumulo/server/test/functional/DeleteEverythingTest.java
+ /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java
+ /local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/constraints/MetadataConstraintsTest.java
+ /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java
+ /local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/util/DefaultMapTest.java
+ /local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/tabletserver/InMemoryMapTest.java
+
+Inorder to run ContinuousQuery, we need to run Reverse.java to populate doc2term
+
+ $ ./bin/accumulo org.apache.accumulo.examples.shard.Reverse instance zookeepers shard doc2term username password
+
+Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds.
+
+ $ ./bin/accumulo org.apache.accumulo.examples.shard.ContinuousQuery instance zookeepers shard doc2term username password 5
+ [public, core, class, binarycomparable, b] 2 0.081
+ [wordtodelete, unindexdocument, doctablename, putdelete, insert] 1 0.041
+ [import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1 0.049
+ [getpackage, testversion, util, version, 55] 1 0.048
+ [for, static, println, public, the] 55 0.211
+ [sleeptime, wrappingiterator, options, long, utilwaitthread] 1 0.057
+ [string, public, long, 0, wait] 12 0.132
Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext
------------------------------------------------------------------------------
svn:executable = *
Modified: incubator/accumulo/site/trunk/templates/sidenav.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/templates/sidenav.mdtext?rev=1195687&r1=1195686&r2=1195687&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/templates/sidenav.mdtext (original)
+++ incubator/accumulo/site/trunk/templates/sidenav.mdtext Mon Oct 31 21:40:44 2011
@@ -17,6 +17,7 @@
# Documentation
- [Manual v1.3](/accumulo/user_manual_1.3-incubating)
+ - [Examples v1.3](/accumulo/user_manual_1.3-incubating/examples.html)
- [Manual v1.4](/accumulo/user_manual_1.4-incubating)
<!-- - [Getting Started](/accumulo/getting_started.html) -->
<!-- - Javadoc -->