You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by do...@apache.org on 2022/04/04 13:16:48 UTC
[accumulo-examples] branch main updated: Fix and improve several examples (#94)
This is an automated email from the ASF dual-hosted git repository.
domgarguilo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/accumulo-examples.git
The following commit(s) were added to refs/heads/main by this push:
new ac2ec84 Fix and improve several examples (#94)
ac2ec84 is described below
commit ac2ec84b87910fbb656751fb927995396798029c
Author: Dom G <do...@apache.org>
AuthorDate: Mon Apr 4 09:16:43 2022 -0400
Fix and improve several examples (#94)
---
docs/bloom.md | 2 +-
docs/classpath.md | 2 +-
docs/compactionStrategy.md | 10 +++++-----
docs/shard.md | 2 +-
docs/tabletofile.md | 6 ++----
docs/terasort.md | 4 ++--
docs/wordcount.md | 6 ++++--
7 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/docs/bloom.md b/docs/bloom.md
index 8a38df5..da3a974 100644
--- a/docs/bloom.md
+++ b/docs/bloom.md
@@ -24,7 +24,7 @@ do not exist in a table.
Accumulo data is divided into tablets and each tablet has multiple r-files.
Lookup performance of a tablet with 3 r-files can be 3 times slower than
-a tablet with one r-file. However if the files contain unique sets of data,
+a tablet with one r-file. However, if the files contain unique sets of data,
then bloom filters can help with performance.
Run the example below to create two identical tables. One table has bloom
diff --git a/docs/classpath.md b/docs/classpath.md
index efd37bc..e12df09 100644
--- a/docs/classpath.md
+++ b/docs/classpath.md
@@ -66,7 +66,7 @@ use cx1.
root@uno examples.nofootwo> setiter -n foofilter -p 10 -scan -minc -majc -class org.apache.accumulo.test.FooFilter
2013-05-03 12:49:35,943 [shell.Shell] ERROR: org.apache.accumulo.shell.ShellCommandException: Command could
not be initialized (Unable to load org.apache.accumulo.test.FooFilter; class not found.)
- root@uno examples.nofootwo> config -t nofootwo -s table.class.loader.context=cx1
+ root@uno examples.nofootwo> config -t examples.nofootwo -s table.class.loader.context=cx1
root@uno examples.nofootwo> setiter -n foofilter -p 10 -scan -minc -majc -class org.apache.accumulo.test.FooFilter
Filter accepts or rejects each Key/Value pair
----------> set FooFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method: false
diff --git a/docs/compactionStrategy.md b/docs/compactionStrategy.md
index 8ae0908..b0be2fa 100644
--- a/docs/compactionStrategy.md
+++ b/docs/compactionStrategy.md
@@ -45,10 +45,10 @@ The commands below will configure the BasicCompactionStrategy to:
```bash
$ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.file.compress.type=snappy"
- $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s examples.table.majc.compaction.strategy=org.apache.accumulo.tserver.compaction.strategies.BasicCompactionStrategy"
- $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s examples.table.majc.compaction.strategy.opts.filter.size=250M"
- $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s examples.table.majc.compaction.strategy.opts.large.compress.threshold=100M"
- $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s examples.table.majc.compaction.strategy.opts.large.compress.type=gz"
+ $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.majc.compaction.strategy=org.apache.accumulo.tserver.compaction.strategies.BasicCompactionStrategy"
+ $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.majc.compaction.strategy.opts.filter.size=250M"
+ $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.majc.compaction.strategy.opts.large.compress.threshold=100M"
+ $ accumulo shell -u <username> -p <password> -e "config -t examples.test1 -s table.majc.compaction.strategy.opts.large.compress.type=gz"
```
Generate some data and files in order to test the strategy:
@@ -64,7 +64,7 @@ $ ./bin/runex client.SequentialBatchWriter -t examples.test1 --start 0 --num 130
$ accumulo shell -u <username> -p <password> -e "flush -t examples.test1"
```
-View the tserver log in <accumulo_home>/logs for the compaction and find the name of the <rfile> that was compacted for your table. Print info about this file using the PrintInfo tool:
+View the tserver log in <accumulo_home>/logs for the compaction and find the name of the `rfile` that was compacted for your table. Print info about this file using the PrintInfo tool:
```bash
$ accumulo rfile-info <rfile>
diff --git a/docs/shard.md b/docs/shard.md
index f6f6848..97a9d40 100644
--- a/docs/shard.md
+++ b/docs/shard.md
@@ -43,7 +43,7 @@ The following command queries the index to find all files containing 'foo' and '
/local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java
/local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java
-In order to run ContinuousQuery, we need to run Reverse.java to populate doc2term.
+In order to run ContinuousQuery, we need to run Reverse.java to populate the `examples.doc2term` table.
$ ./bin/runex shard.Reverse --shardTable examples.shard --doc2Term examples.doc2term
diff --git a/docs/tabletofile.md b/docs/tabletofile.md
index c72d5b8..5968e29 100644
--- a/docs/tabletofile.md
+++ b/docs/tabletofile.md
@@ -30,7 +30,7 @@ put a trivial amount of data into accumulo using the accumulo shell:
root@instance examples.input> quit
The TableToFile class configures a map-only job to read the specified columns and
-write the key/value pairs to a file in HDFS.
+writes the key/value pairs to a file in HDFS.
The following will extract the rows containing the column "cf:cq":
@@ -45,6 +45,4 @@ We can see the output of our little map-reduce job:
$ hadoop fs -text /tmp/output/part-m-00000
catrow cf:cq [] catvalue
- dogrow cf:cq [] dogvalue
- $
-
+ dogrow cf:cq [] dogvalue
\ No newline at end of file
diff --git a/docs/terasort.md b/docs/terasort.md
index 16f2ea1..5539883 100644
--- a/docs/terasort.md
+++ b/docs/terasort.md
@@ -25,10 +25,10 @@ ignored.
$ accumulo shell -u root -p secret -e 'createnamespace examples'
-To run this example you run it with arguments describing the amount of data:
+This example is run with arguments describing the amount of data:
$ ./bin/runmr mapreduce.TeraSortIngest --count 10 --minKeySize 10 --maxKeySize 10 \
- --minValueSize 78 --maxValueSize 78 --table examples.sort --splits 10 \
+ --minValueSize 78 --maxValueSize 78 --table examples.sort --splits 10
After the map reduce job completes, scan the data:
diff --git a/docs/wordcount.md b/docs/wordcount.md
index 4c5a27f..fca4af0 100644
--- a/docs/wordcount.md
+++ b/docs/wordcount.md
@@ -55,10 +55,12 @@ information like passwords. A more secure option is store accumulo-client.proper
in HDFS and run the job with the `-D` options. This will configure the MapReduce job
to obtain the client properties from HDFS:
- $ hdfs dfs -copyFromLocal ./conf/accumulo-client.properties /user/myuser/
+ $ hdfs dfs -mkdir /user
+ $ hdfs dfs -mkdir /user/myuser
+ $ hdfs dfs -copyFromLocal /path/to/accumulo/conf/accumulo-client.properties /user/myuser/
$ ./bin/runmr mapreduce.WordCount -i /wc -t examples.wordcount2 -d /user/myuser/accumulo-client.properties
-After the MapReduce job completes, query the `wordcount2` table. The results should
+After the MapReduce job completes, query the `examples.wordcount2` table. The results should
be the same as before:
$ accumulo shell