You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2018/01/02 21:03:00 UTC
[jira] [Commented] (ACCUMULO-4749) Need a bulk loading test
equivalent to continuous ingest
[ https://issues.apache.org/jira/browse/ACCUMULO-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308704#comment-16308704 ]
Keith Turner commented on ACCUMULO-4749:
----------------------------------------
Below is a possible design for this issue.
Generators that do the following in a loop
* Create 1 million random numbers that point to the previous 1 million (if there is a previous)
* Write the 1 million to a file in hdfs in directory A.
* Check the size of the hdfs directory where files are output to. If its too large then wait.
Ingest processes does the following in a loop.
* Move all files from dir A to dir B.
* Run a map reduce job to sort files in B into dir C. This step creates the rfiles needed for bulk import.
* Bulk import files in dir C into Accumulo
> Need a bulk loading test equivalent to continuous ingest
> --------------------------------------------------------
>
> Key: ACCUMULO-4749
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4749
> Project: Accumulo
> Issue Type: Improvement
> Components: test
> Reporter: Ivan Bella
> Assignee: Jared R
> Labels: pull-request-available
> Time Spent: 1h
> Remaining Estimate: 0h
>
> There are some known cases at least in past versions where bulk loading may fail leaving the ~blip in place but no transaction left to handle it. This will result in directories of files being left around that are not loaded. We should create a continuous ingest variant that uses bulk loading instead. Then if this is run with agitation, the continuous ingest verification can find data that has been essentially orphaned.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)