You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Commented) (JIRA)" <ji...@apache.org> on 2012/02/10 17:47:00 UTC

[jira] [Commented] (ACCUMULO-387) Map reduce directly over files

    [ https://issues.apache.org/jira/browse/ACCUMULO-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205538#comment-13205538 ] 

Keith Turner commented on ACCUMULO-387:
---------------------------------------

I think this is much easier now that we have clone table.  I think the following needs to be done :

 * Clone table (pass in options to disable writes and major compactions on clone)
 * Run map reduce over files referenced by clone
 * Delete clone

Would need a special input format that instantiates the iterator stack in the mapper for each tablet.  Doing this instead of reading the files directly is important for the following reasons.

 * The iterator stack will properly process updates and deletes that were made
 * The iterator stack will only read the data in a file that falls within a tablet.  This is important because tablets can reference files that contain data outside of a tablet, data that could have been deleted in another tablet.  Using the iterator stack will prevent this from happening.


                
> Map reduce directly over files
> ------------------------------
>
>                 Key: ACCUMULO-387
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-387
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Keith Turner
>             Fix For: 1.5.0
>
>
> Support map reduce jobs that directly read Accumulo files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira