You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Commented) (JIRA)" <ji...@apache.org> on 2012/02/10 17:47:00 UTC
[jira] [Commented] (ACCUMULO-387) Map reduce directly over files
[ https://issues.apache.org/jira/browse/ACCUMULO-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205538#comment-13205538 ]
Keith Turner commented on ACCUMULO-387:
---------------------------------------
I think this is much easier now that we have clone table. I think the following needs to be done :
* Clone table (pass in options to disable writes and major compactions on clone)
* Run map reduce over files referenced by clone
* Delete clone
Would need a special input format that instantiates the iterator stack in the mapper for each tablet. Doing this instead of reading the files directly is important for the following reasons.
* The iterator stack will properly process updates and deletes that were made
* The iterator stack will only read the data in a file that falls within a tablet. This is important because tablets can reference files that contain data outside of a tablet, data that could have been deleted in another tablet. Using the iterator stack will prevent this from happening.
> Map reduce directly over files
> ------------------------------
>
> Key: ACCUMULO-387
> URL: https://issues.apache.org/jira/browse/ACCUMULO-387
> Project: Accumulo
> Issue Type: New Feature
> Reporter: Keith Turner
> Fix For: 1.5.0
>
>
> Support map reduce jobs that directly read Accumulo files.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira