You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Veena Basavaraj (JIRA)" <ji...@apache.org> on 2014/11/07 04:23:35 UTC
[jira] [Updated] (SQOOP-1603) Sqoop2: Explicit support for Merge
in the Sqoop Job lifecyle
[ https://issues.apache.org/jira/browse/SQOOP-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Veena Basavaraj updated SQOOP-1603:
-----------------------------------
Fix Version/s: 1.99.5
> Sqoop2: Explicit support for Merge in the Sqoop Job lifecyle
> --------------------------------------------------------------
>
> Key: SQOOP-1603
> URL: https://issues.apache.org/jira/browse/SQOOP-1603
> Project: Sqoop
> Issue Type: Bug
> Reporter: Veena Basavaraj
> Assignee: Qian Xu
> Fix For: 1.99.5
>
>
> The Destroyer api and its javadoc
> {code}
> /**
> * This allows connector to define work to complete execution, for example,
> * resource cleaning.
> */
> public abstract class Destroyer<LinkConfiguration, JobConfiguration> {
> /**
> * Callback to clean up after job execution.
> *
> * @param context Destroyer context
> * @param linkConfiguration link configuration object
> * @param jobConfiguration job configuration object for the FROM and TO
> * In case of the FROM initializer this will represent the FROM job configuration
> * In case of the TO initializer this will represent the TO job configuration
> */
> public abstract void destroy(DestroyerContext context,
> LinkConfiguration linkConfiguration,
> JobConfiguration jobConfiguration);
> }
> {code}
> This ticket was created while reviewing the Kite Connector use case where the destroyer does the actual temp data set merge
> https://reviews.apache.org/r/26963/diff/# [~stanleyxu2005]
> {code}
> public void destroy(DestroyerContext context, LinkConfiguration link,
> ToJobConfiguration job) {
> LOG.info("Running Kite connector destroyer");
> // Every loader instance creates a temporary dataset. If the MR job is
> // successful, all temporary dataset should be merged as one dataset,
> // otherwise they should be deleted all.
> String[] uris = KiteDatasetExecutor.listTemporaryDatasetUris(
> job.toDataset.uri);
> if (context.isSuccess()) {
> KiteDatasetExecutor executor = new KiteDatasetExecutor(job.toDataset.uri,
> context.getSchema(), link.link.fileFormat);
> for (String uri : uris) {
> executor.mergeDataset(uri);
> LOG.info(String.format("Temporary dataset %s merged", uri));
> }
> } else {
> for (String uri : uris) {
> KiteDatasetExecutor.deleteDataset(uri);
> LOG.info(String.format("Temporary dataset %s deleted", uri));
> }
> }
> }
> {code}
> Wondering if such things should be its own phase rather than in destroyers. The responsibility of destroyer is more to clean up/ closing/ daat sources for both FROM/TO data sources to be more precise .. should such operations that modify records / merge/ munge be its own step ?.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)