You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "David Robson (JIRA)" <ji...@apache.org> on 2015/08/07 05:58:46 UTC

[jira] [Created] (SQOOP-2465) Initializer and Destroyer should know how many executors will run

David Robson created SQOOP-2465:
-----------------------------------

             Summary: Initializer and Destroyer should know how many executors will run
                 Key: SQOOP-2465
                 URL: https://issues.apache.org/jira/browse/SQOOP-2465
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.99.6
            Reporter: David Robson


Looking at a job to load data into Oracle as an example - depending on the way the user wants to load data, we may be loading data into temporary tables. For maximum performance we need to create a separate temporary table for each loader - so when the initializer is running we need to know how many loaders will run so we can create these temporary tables. Again when the destroyer is run we will need to drop these temporary tables - so it will need to know as well.

Another example where we need to know this in the initializer - Oracle databases may be real application clusters where there is multiple instances across multiple machines. For both FROM and TO jobs we spread the load across these instances during the initialization phase - so we need to know how many loaders / extractors will run.

In the case of a FROM job we could do this in the partition phase - but there is no way to achieve this for a TO job. It seems we could either add the information into the initialize phase - or add a new partition phase on the TO side that is called after the partition phase on the FROM side. It could take the details of the partitioned output and match it up to the other side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)