You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by "Eric Newton (Resolved) (JIRA)" <ji...@apache.org> on 2012/04/03 17:38:24 UTC

[jira] [Resolved] (ACCUMULO-375) Wikipedia Ingest needs more parallelism

     [ https://issues.apache.org/jira/browse/ACCUMULO-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Newton resolved ACCUMULO-375.
----------------------------------

    Resolution: Not A Problem
    
> Wikipedia Ingest needs more parallelism
> ---------------------------------------
>
>                 Key: ACCUMULO-375
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-375
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>            Assignee: Adam Fuchs
>
> The wikipedia ingest Map job uses a derivative of the FileInputFormat, which launches one job per file. Given the partitioning strategy and workload distribution, it makes sense to launch multiple mappers per file. Each mapper can then take a chunk of the articles in the file using the same partitioning strategy as the assignment of row IDs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira