You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by bi...@apache.org on 2011/10/31 22:40:45 UTC
svn commit: r1195687 [2/2] - in /incubator/accumulo: branches/1.3/docs/examples/ site/trunk/content/accumulo/user_manual_1.3-incubating/ site/trunk/content/accumulo/user_manual_1.3-incubating/examples/ site/trunk/templates/

Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,57 @@
+Title: File System Archive
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This example shows how to use Accumulo to store a file system history.  It has three classes:
+
+ * Ingest.java - Recursively lists the files and directories under a given path, ingests their names and file info (not the file data!) into a Accumulo table, and indexes the file names in a separate table.
+ * QueryUtil.java - Provides utility methods for getting the info for a file, listing the contents of a directory, and performing single wild card searches on file or directory names.
+ * Viewer.java - Provides a GUI for browsing the file system information stored in Accumulo.
+ * FileCountMR.java - Runs MR over the file system information and writes out counts to a Accumulo table.
+ * FileCount.java - Accomplishes the same thing as FileCountMR, but in a different way.  Computes recursive counts and stores them back into table.
+ * StringArraySummation.java - Aggregates counts for the FileCountMR reducer.
+ 
+To begin, ingest some data with Ingest.java.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.dirlist.Ingest instance zookeepers username password direxample dirindex exampleVis /local/user1/workspace
+
+Note that running this example will create tables direxample and dirindex in Accumulo that you should delete when you have completed the example.
+If you modify a file or add new files in the directory ingested (e.g. /local/user1/workspace), you can run Ingest again to add new information into the Accumulo tables.
+
+To browse the data ingested, use Viewer.java.  Be sure to give the "username" user the authorizations to see the data.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.dirlist.Viewer instance zookeepers username password direxample exampleVis /local/user1/workspace
+
+To list the contents of specific directories, use QueryUtil.java.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password direxample exampleVis /local/user1
+    $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password direxample exampleVis /local/user1/workspace
+
+To perform searches on file or directory names, also use QueryUtil.java.  Search terms must contain no more than one wild card and cannot contain "/".
+Note these queries run on the dirindex table instead of the direxample table.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis filename -search
+    $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis filename* -search
+    $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis *jar -search
+    $ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis filename*jar -search
+
+To count the number of direct children (directories and files) and descendants (children and children's descendents, directories and files), run the FileCountMR over the direxample table.
+The results can be written back to the same table.
+
+    $ ./bin/tool.sh lib/accumulo-examples-*.jar org.apache.accumulo.examples.dirlist.FileCountMR instance zookeepers username password direxample direxample exampleVis exampleVis
+
+Alternatively, you can also run FileCount.java.

Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/dirlist.mdtext
------------------------------------------------------------------------------
    svn:executable = *

Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,90 @@
+Title: Filter Example
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This is a simple filter example.  It uses the AgeOffFilter that is provided as 
+part of the core package org.apache.accumulo.core.iterators.filter.  Filters are used by
+the FilteringIterator to select desired key/value pairs (or weed out undesired 
+ones).  Filters implement the org.apache.accumulo.core.iterators.iterators.filter.Filter interface which 
+contains a method accept(Key k, Value v).  This method returns true if the key, 
+value pair are to be delivered and false if they are to be ignored.
+
+    username@instance> createtable filtertest
+    username@instance filtertest> setiter -t filtertest -scan -p 10 -n myfilter -filter
+    FilteringIterator uses Filters to accept or reject key/value pairs
+    ----------> entering options: <filterPriorityNumber> <ageoff|regex|filterClass>
+    ----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 0 ageoff
+    ----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 
+    AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
+    ----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day: 
+    ----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter ttl, time to live (milliseconds): 30000
+    username@instance filtertest> 
+    
+    username@instance filtertest> scan
+    username@instance filtertest> insert foo a b c
+    insert successful
+    username@instance filtertest> scan
+    foo a:b []	c
+    
+... wait 30 seconds ...
+    
+    username@instance filtertest> scan
+    username@instance filtertest>
+
+Note the absence of the entry inserted more than 30 seconds ago.  Since the
+scope was set to "scan", this means the entry is still in Accumulo, but is
+being filtered out at query time.  To delete entries from Accumulo based on
+the ages of their timestamps, AgeOffFilters should be set up for the "minc"
+and "majc" scopes, as well.
+
+To force an ageoff in the persisted data, after setting up the ageoff iterator 
+on the "minc" and "majc" scopes you can flush and compact your table. This will
+happen automatically as a background operation on any table that is being 
+actively written to, but these are the commands to force compaction:
+
+    username@instance filtertest> flush -t filtertest
+    08 11:13:55,745 [shell.Shell] INFO : Flush of table filtertest initiated...
+    username@instance filtertest> compact -t filtertest
+    08 11:14:10,800 [shell.Shell] INFO : Compaction of table filtertest scheduled for 20110208111410EST
+    username@instance filtertest> 
+
+After the compaction runs, the newly created files will not contain any data that should be aged off, and the
+Accumulo garbage collector will remove the old files.
+
+To see the iterator settings for a table, use:
+
+    username@instance filtertest> config -t filtertest -f iterator
+    ---------+------------------------------------------+----------------------------------------------------------
+    SCOPE    | NAME                                     | VALUE
+    ---------+------------------------------------------+----------------------------------------------------------
+    table    | table.iterator.majc.vers................ | 20,org.apache.accumulo.core.iterators.VersioningIterator
+    table    | table.iterator.majc.vers.opt.maxVersions | 1
+    table    | table.iterator.minc.vers................ | 20,org.apache.accumulo.core.iterators.VersioningIterator
+    table    | table.iterator.minc.vers.opt.maxVersions | 1
+    table    | table.iterator.scan.myfilter............ | 10,org.apache.accumulo.core.iterators.FilteringIterator
+    table    | table.iterator.scan.myfilter.opt.0...... | org.apache.accumulo.core.iterators.filter.AgeOffFilter
+    table    | table.iterator.scan.myfilter.opt.0.ttl.. | 30000
+    table    | table.iterator.scan.vers................ | 20,org.apache.accumulo.core.iterators.VersioningIterator
+    table    | table.iterator.scan.vers.opt.maxVersions | 1
+    ---------+------------------------------------------+----------------------------------------------------------
+    username@instance filtertest> 
+
+If you would like to apply multiple filters, this can be done using a single
+iterator. Just continue adding entries during the 
+"set org.apache.accumulo.core.iterators.FilteringIterator option" step.
+Make sure to order the filterPriorityNumbers in the order you would like
+the filters to be applied.

Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.mdtext
------------------------------------------------------------------------------
    svn:executable = *

Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,52 @@
+Title: Hello World Example
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.helloworld in the accumulo-examples module: 
+
+ * InsertWithBatchWriter.java - Inserts 10K rows (50K entries) into accumulo with each row having 5 entries
+ * InsertWithOutputFormat.java - Example of inserting data in MapReduce
+ * ReadData.java - Reads all data between two rows
+
+Log into the accumulo shell:
+
+    $ ./bin/accumulo shell -u username -p password
+
+Create a table called 'hellotable':
+
+    username@instance> createtable hellotable	
+
+Launch a Java program that inserts data with a BatchWriter:
+
+    $ ./bin/accumulo org.apache.accumulo.examples.helloworld.InsertWithBatchWriter instance zookeepers hellotable username password
+
+Alternatively, the same data can be inserted using MapReduce writers:
+
+    $ ./bin/accumulo org.apache.accumulo.examples.helloworld.InsertWithOutputFormat instance zookeepers hellotable username password
+
+On the accumulo status page at the URL below (where 'master' is replaced with the name or IP of your accumulo master), you should see 50K entries
+	
+    http://master:50095/
+	
+To view the entries, use the shell to scan the table:
+
+    username@instance> table hellotable
+    username@instance hellotable> scan
+
+You can also use a Java class to scan the table:
+
+    $ ./bin/accumulo org.apache.accumulo.examples.helloworld.ReadData instance zookeepers hellotable username password row_0 row_1001

Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.mdtext
------------------------------------------------------------------------------
    svn:executable = *

Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,85 @@
+Title: MapReduce Example
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This example uses mapreduce and accumulo to compute word counts for a set of
+documents.  This is accomplished using a map only map reduce job and a
+accumulo table with aggregators.
+
+To run this example you will need a directory in HDFS containing text files.
+The accumulo readme will be used to show how to run this example.
+
+    $ hadoop fs -copyFromLocal $ACCUMULO_HOME/README /user/username/wc/Accumulo.README
+    $ hadoop fs -ls /user/username/wc
+    Found 1 items
+    -rw-r--r--   2 username supergroup       9359 2009-07-15 17:54 /user/username/wc/Accumulo.README
+
+The first part of running this example is to create a table with aggregation
+for the column family count.
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Accumulo Interactive Shell
+    - version: 1.3.x-incubating
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    - 
+    - type 'help' for a list of available commands
+    - 
+    username@instance> createtable wordCount -a count=org.apache.accumulo.core.iterators.aggregation.StringSummation 
+    username@instance wordCount> quit
+
+After creating the table, run the word count map reduce job.
+
+    [user1@instance accumulo]$ bin/tool.sh lib/accumulo-examples-*.jar org.apache.accumulo.examples.mapreduce.WordCount instance zookeepers /user/user1/wc wordCount -u username -p password
+    
+    11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1
+    11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003
+    11/02/07 18:20:13 INFO mapred.JobClient:  map 0% reduce 0%
+    11/02/07 18:20:20 INFO mapred.JobClient:  map 100% reduce 0%
+    11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003
+    11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6
+    11/02/07 18:20:22 INFO mapred.JobClient:   Job Counters 
+    11/02/07 18:20:22 INFO mapred.JobClient:     Launched map tasks=1
+    11/02/07 18:20:22 INFO mapred.JobClient:     Data-local map tasks=1
+    11/02/07 18:20:22 INFO mapred.JobClient:   FileSystemCounters
+    11/02/07 18:20:22 INFO mapred.JobClient:     HDFS_BYTES_READ=10487
+    11/02/07 18:20:22 INFO mapred.JobClient:   Map-Reduce Framework
+    11/02/07 18:20:22 INFO mapred.JobClient:     Map input records=255
+    11/02/07 18:20:22 INFO mapred.JobClient:     Spilled Records=0
+    11/02/07 18:20:22 INFO mapred.JobClient:     Map output records=1452
+
+After the map reduce job completes, query the accumulo table to see word
+counts.
+
+    $ ./bin/accumulo shell -u username -p password
+    username@instance> table wordCount
+    username@instance wordCount> scan -b the
+    the count:20080906 []    75
+    their count:20080906 []    2
+    them count:20080906 []    1
+    then count:20080906 []    1
+    there count:20080906 []    1
+    these count:20080906 []    3
+    this count:20080906 []    6
+    through count:20080906 []    1
+    time count:20080906 []    3
+    time. count:20080906 []    1
+    to count:20080906 []    27
+    total count:20080906 []    1
+    tserver, count:20080906 []    1
+    tserver.compaction.major.concurrent.max count:20080906 []    1
+    ...

Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.mdtext
------------------------------------------------------------------------------
    svn:executable = *

Added: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext?rev=1195687&view=auto
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext (added)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext Mon Oct 31 21:40:44 2011
@@ -0,0 +1,66 @@
+Title: Shard Example
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+Accumulo has in iterator called the intersecting iterator which supports querying a term index that is partitioned by 
+document, or "sharded". This example shows how to use the intersecting iterator through these four programs:
+
+ * Index.java - Indexes a set of text files into a Accumulo table
+ * Query.java - Finds documents containing a given set of terms.
+ * Reverse.java - Reads the index table and writes a map of documents to terms into another table.
+ * ContinuousQuery.java  Uses the table populated by Reverse.java to select N random terms per document.  Then it continuously and randomly queries those terms.
+
+To run these example programs, create two tables like below.
+
+    username@instance> createtable shard
+    username@instance shard> createtable doc2term
+
+After creating the tables, index some files.  The following command indexes all of the java files in the Accumulo source code.
+
+    $ cd /local/user1/workspace/accumulo/
+    $ find src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.shard.Index instance zookeepers shard username password 30
+
+The following command queries the index to find all files containing 'foo' and 'bar'.
+
+    $ cd $ACCUMULO_HOME
+    $ ./bin/accumulo org.apache.accumulo.examples.shard.Query instance zookeepers shard username password foo bar
+    /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java
+    /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java
+    /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java
+    /local/user1/workspace/accumulo/src/server/src/main/java/accumulo/server/test/functional/RowDeleteTest.java
+    /local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/logger/TestLogWriter.java
+    /local/user1/workspace/accumulo/src/server/src/main/java/accumulo/server/test/functional/DeleteEverythingTest.java
+    /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java
+    /local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/constraints/MetadataConstraintsTest.java
+    /local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java
+    /local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/util/DefaultMapTest.java
+    /local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/tabletserver/InMemoryMapTest.java
+
+Inorder to run ContinuousQuery, we need to run Reverse.java to populate doc2term
+
+    $ ./bin/accumulo org.apache.accumulo.examples.shard.Reverse instance zookeepers shard doc2term username password
+
+Below ContinuousQuery is run using 5 terms.  So it selects 5 random terms from each document, then it continually randomly selects one set of 5 terms and queries.  It prints the number of matching documents and the time in seconds.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.shard.ContinuousQuery instance zookeepers shard doc2term username password 5
+    [public, core, class, binarycomparable, b] 2  0.081
+    [wordtodelete, unindexdocument, doctablename, putdelete, insert] 1  0.041
+    [import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1  0.049
+    [getpackage, testversion, util, version, 55] 1  0.048
+    [for, static, println, public, the] 55  0.211
+    [sleeptime, wrappingiterator, options, long, utilwaitthread] 1  0.057
+    [string, public, long, 0, wait] 12  0.132

Propchange: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.mdtext
------------------------------------------------------------------------------
    svn:executable = *

Modified: incubator/accumulo/site/trunk/templates/sidenav.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/templates/sidenav.mdtext?rev=1195687&r1=1195686&r2=1195687&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/templates/sidenav.mdtext (original)
+++ incubator/accumulo/site/trunk/templates/sidenav.mdtext Mon Oct 31 21:40:44 2011
@@ -17,6 +17,7 @@
 
 # Documentation
  - [Manual v1.3](/accumulo/user_manual_1.3-incubating)
+    - [Examples v1.3](/accumulo/user_manual_1.3-incubating/examples.html)
  - [Manual v1.4](/accumulo/user_manual_1.4-incubating)
 <!-- - [Getting Started](/accumulo/getting_started.html) -->
 <!-- - Javadoc -->