You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2019/05/24 16:47:00 UTC

[jira] [Assigned] (KUDU-2786) Parallelize tables for backup and restore

     [ https://issues.apache.org/jira/browse/KUDU-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Berkeley reassigned KUDU-2786:
-----------------------------------

    Assignee: Will Berkeley

> Parallelize tables for backup and restore 
> ------------------------------------------
>
>                 Key: KUDU-2786
>                 URL: https://issues.apache.org/jira/browse/KUDU-2786
>             Project: Kudu
>          Issue Type: Improvement
>    Affects Versions: 1.9.0
>            Reporter: Grant Henke
>            Assignee: Will Berkeley
>            Priority: Major
>              Labels: backup
>
> Currently the backup and restore jobs process tables serially. This works well to ensure resources aren't over allocated upfront, but could be less performant for cases where there are many small tables. Instead we could parallelize the Spark jobs for each table. 
> It should be straightforward to use Scala futures to run multiple jobs in parallel and check their status. We could add a configuration to cap the maximum number of tables run at the same time, though maybe that isn't really needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)