You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2018/01/02 21:03:00 UTC

[jira] [Commented] (ACCUMULO-4749) Need a bulk loading test equivalent to continuous ingest

    [ https://issues.apache.org/jira/browse/ACCUMULO-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308704#comment-16308704 ] 

Keith Turner commented on ACCUMULO-4749:
----------------------------------------

Below is a possible design for this issue.

Generators that do the following in a loop
  * Create 1 million random numbers that point to the previous 1 million (if there is a previous)
  * Write the 1 million to a file in hdfs in directory A.
  * Check the size of the hdfs directory where files are output to.  If its too large then wait.

Ingest processes does the following in a loop.
  * Move all files from dir A to dir B.
  * Run a map reduce job to sort files in B into dir C. This step creates the rfiles needed for bulk import.
  * Bulk import files in dir C into Accumulo
 




> Need a bulk loading test equivalent to continuous ingest
> --------------------------------------------------------
>
>                 Key: ACCUMULO-4749
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4749
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: test
>            Reporter: Ivan Bella
>            Assignee: Jared R
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are some known cases at least in past versions where bulk loading may fail leaving the ~blip in place but no transaction left to handle it.  This will result in directories of files being left around that are not loaded.  We should create a continuous ingest variant that uses bulk loading instead.  Then if this is run with agitation, the continuous ingest verification can find data that has been essentially orphaned.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)