You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2019/07/18 19:40:31 UTC

[accumulo-testing] branch keith-turner-patch-1 created (now 6f05937)

This is an automated email from the ASF dual-hosted git repository.

kturner pushed a change to branch keith-turner-patch-1
in repository https://gitbox.apache.org/repos/asf/accumulo-testing.git.


      at 6f05937  Update ci bulk ingest docs

This branch includes the following new commits:

     new 6f05937  Update ci bulk ingest docs

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[accumulo-testing] 01/01: Update ci bulk ingest docs

Posted by kt...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kturner pushed a commit to branch keith-turner-patch-1
in repository https://gitbox.apache.org/repos/asf/accumulo-testing.git

commit 6f059376d6644b24a0aa836cd6dba96c879dd8ff
Author: Keith Turner <kt...@apache.org>
AuthorDate: Thu Jul 18 15:40:27 2019 -0400

    Update ci bulk ingest docs
---
 docs/bulk-test.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/docs/bulk-test.md b/docs/bulk-test.md
index df5dc45..da8f163 100644
--- a/docs/bulk-test.md
+++ b/docs/bulk-test.md
@@ -8,6 +8,12 @@ in a loop like the following to continually bulk import data.
 # create the ci table if necessary
 ./bin/cingest createtable
 
+# Optionally, consider lowering the split threshold to make splits happen more 
+# frequently while the test runs.  Choose a threshold base on the amount of data
+# being imported and the desired number of splits.
+# 
+#   accumulo shell -u root -p secret -e 'config -t ci -s table.split.threshold=32M'
+
 for i in $(seq 1 10); do
    # run map reduce job to generate data for bulk import
    ./bin/cingest bulk /tmp/bt/$i
@@ -47,3 +53,13 @@ scan -t accumulo.metadata -b ~blip -e ~blip~
 scan -t accumulo.metadata -c loaded
 ```
 
+The counts (add referenced and unrefrenced) output by `cingest verify` should equal :
+
+```
+test.ci.bulk.map.task * test.ci.bulk.map.nodes * num_bulk_generate_jobs
+``` 
+
+Its possible the counts could be slightly smaller because of collisions. However collisions 
+are unlikely with the default settings given there are 63 bits of randomness in the row and 
+30 bits in the column.  This gives a total of 93 bits of randomness per key.
+