You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2012/01/10 20:26:21 UTC
svn commit: r1229699 -
/incubator/accumulo/branches/1.4/docs/examples/README.bloom
Author: kturner
Date: Tue Jan 10 19:26:20 2012
New Revision: 1229699
URL: http://svn.apache.org/viewvc?rev=1229699&view=rev
Log:
ACCUM
Modified:
incubator/accumulo/branches/1.4/docs/examples/README.bloom
Modified: incubator/accumulo/branches/1.4/docs/examples/README.bloom
URL: http://svn.apache.org/viewvc/incubator/accumulo/branches/1.4/docs/examples/README.bloom?rev=1229699&r1=1229698&r2=1229699&view=diff
==============================================================================
--- incubator/accumulo/branches/1.4/docs/examples/README.bloom (original)
+++ incubator/accumulo/branches/1.4/docs/examples/README.bloom Tue Jan 10 19:26:20 2012
@@ -93,8 +93,55 @@ prevent the files from being compacted i
* Flush the table using the shell
After following the above steps, each table will have a tablet with three map
-files. Each map file will contain 1 million entries generated with a different
-seed.
+files. Flushing the table after each batch of inserts will create a map file.
+Each map file will contain 1 million entries generated with a different seed.
+This is assuming that Accumulo is configured with enough memory to hold 1
+million inserts. If not, then more map files will be created.
+
+The commands for creating the first table without bloom filters are below.
+
+ $ ./accumulo shell -u username -p password
+ Shell - Accumulo Interactive Shell
+ - version: 1.4.x-incubating
+ - instance name: instance
+ - instance id: 00000000-0000-0000-0000-000000000000
+ -
+ - type 'help' for a list of available commands
+ -
+ username@instance> setauths -u username -s exampleVis
+ username@instance> createtable bloom_test1
+ username@instance bloom_test1> config -t bloom_test1 -s table.compaction.major.ratio=7
+ username@instance bloom_test1> exit
+
+ $ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchWriter -s 7 instance zookeepers username password bloom_test1 1000000 0 1000000000 50 2000000 60000 3 exampleVis
+ $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
+ $ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchWriter -s 8 instance zookeepers username password bloom_test1 1000000 0 1000000000 50 2000000 60000 3 exampleVis
+ $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
+ $ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchWriter -s 9 instance zookeepers username password bloom_test1 1000000 0 1000000000 50 2000000 60000 3 exampleVis
+ $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
+
+The commands for creating the second table with bloom filers are below.
+
+ $ ./accumulo shell -u username -p password
+ Shell - Accumulo Interactive Shell
+ - version: 1.4.x-incubating
+ - instance name: instance
+ - instance id: 00000000-0000-0000-0000-000000000000
+ -
+ - type 'help' for a list of available commands
+ -
+ username@instance> setauths -u username -s exampleVis
+ username@instance> createtable bloom_test2
+ username@instance bloom_test2> config -t bloom_test2 -s table.compaction.major.ratio=7
+ username@instance bloom_test2> config -t bloom_test2 -s table.bloom.enabled=true
+ username@instance bloom_test2> exit
+
+ $ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchWriter -s 7 instance zookeepers username password bloom_test2 1000000 0 1000000000 50 2000000 60000 3 exampleVis
+ $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
+ $ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchWriter -s 8 instance zookeepers username password bloom_test2 1000000 0 1000000000 50 2000000 60000 3 exampleVis
+ $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
+ $ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchWriter -s 9 instance zookeepers username password bloom_test2 1000000 0 1000000000 50 2000000 60000 3 exampleVis
+ $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
Below 500 lookups are done against the table without bloom filters using random
NG seed 7. Even though only one map file will likely contain entries for this
@@ -119,3 +166,60 @@ map files existed.
Generating 500 random queries...finished
101.15 lookups/sec 4.94 secs
num results : 500
+
+You can verify the table has three files by looking in HDFS. To look in HDFS
+you will need the table ID, because this is used in HDFS instead of the table
+name. The following command will show table ids.
+
+ $ ./accumulo shell -u username -p password
+ Shell - Accumulo Interactive Shell
+ - version: 1.4.x-incubating
+ - instance name: instance
+ - instance id: 00000000-0000-0000-0000-000000000000
+ -
+ - type 'help' for a list of available commands
+ -
+ username@instance> tables -l
+ !METADATA => !0
+ bloom_test1 => o7
+ bloom_test2 => o8
+ trace => 1
+ username@instance> quit
+
+So the table id for bloom_test2 is o8. The command below shows what files this
+table has in HDFS. This assumes Accumulo is at the default location in HDFS.
+
+ $ hadoop fs -lsr /accumulo/tables/o8
+ drwxr-xr-x - username supergroup 0 2012-01-10 14:02 /accumulo/tables/o8/default_tablet
+ -rw-r--r-- 3 username supergroup 52672650 2012-01-10 14:01 /accumulo/tables/o8/default_tablet/F00000dj.rf
+ -rw-r--r-- 3 username supergroup 52436176 2012-01-10 14:01 /accumulo/tables/o8/default_tablet/F00000dk.rf
+ -rw-r--r-- 3 username supergroup 52850173 2012-01-10 14:02 /accumulo/tables/o8/default_tablet/F00000dl.rf
+
+Running the PrintInfo command shows that one of the files has a bloom filter
+and its 1.5MB.
+
+ $ ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo /accumulo/tables/o8/default_tablet/F00000dj.rf
+ Locality group : <DEFAULT>
+ Start block : 0
+ Num blocks : 752
+ Index level 0 : 43,598 bytes 1 blocks
+ First key : row_0000001169 foo:1 [exampleVis] 1326222052539 false
+ Last key : row_0999999421 foo:1 [exampleVis] 1326222052058 false
+ Num entries : 999,536
+ Column families : [foo]
+
+ Meta block : BCFile.index
+ Raw size : 4 bytes
+ Compressed size : 12 bytes
+ Compression type : gz
+
+ Meta block : RFile.index
+ Raw size : 43,696 bytes
+ Compressed size : 15,592 bytes
+ Compression type : gz
+
+ Meta block : acu_bloom
+ Raw size : 1,540,292 bytes
+ Compressed size : 1,433,115 bytes
+ Compression type : gz
+