You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Jared R (JIRA)" <ji...@apache.org> on 2018/01/03 15:15:00 UTC

[jira] [Comment Edited] (ACCUMULO-4749) Need a bulk loading test equivalent to continuous ingest

    [ https://issues.apache.org/jira/browse/ACCUMULO-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309738#comment-16309738 ] 

Jared R edited comment on ACCUMULO-4749 at 1/3/18 3:14 PM:
-----------------------------------------------------------

So in that case if  we were to do the route you suggest then I would need still need the class for the generator (create numbers linked list -> file -> set of files from directories to loader) and class for the loader(ingest) (turns the files from the directory into rfiles -> loads batch of rfiles to accumulo). 

When I talked to Chris we discussed multiple directories where the generators put files of random numbers in multiple directories, then are sequenced and send to the loader which uses a mapreduce job to turn into Rfiles then loads to Accumulo. I wish I had a way to post the diagram we made. 

If this can be done without having to copy files and possible can use the current ingest code and do some splitting of the task. 


was (Author: jkrdev):
So in that case if  we were to do the route you suggest then I would need still need the class for the generator (create numbers linked list -> file -> set of files from directories to loader) and class for the loader(ingest) (turns the files from the directory into rfiles -> loads batch of rfiles to accumulo). 

When I talked to Chris we discussed multiple directories where the generators put files of random numbers in multiple directories, then are sequenced and send to the loader which uses a mapreduce job to turn into Rfiles then loads to Accumulo. I wish I had a way to post the diagram we made. 

If this can be done without having to copy files and possible can use the current batchwriter and ingest code and make some changes for new classes then that's great. 

> Need a bulk loading test equivalent to continuous ingest
> --------------------------------------------------------
>
>                 Key: ACCUMULO-4749
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4749
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: test
>            Reporter: Ivan Bella
>            Assignee: Jared R
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There are some known cases at least in past versions where bulk loading may fail leaving the ~blip in place but no transaction left to handle it.  This will result in directories of files being left around that are not loaded.  We should create a continuous ingest variant that uses bulk loading instead.  Then if this is run with agitation, the continuous ingest verification can find data that has been essentially orphaned.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)