You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sqoop.apache.org by "Jarek Jarcec Cecho (Created) (JIRA)" <ji...@apache.org> on 2012/02/18 20:05:59 UTC

[jira] [Created] (SQOOP-443) Calling sqoop with hive import is not working multiple times due to kept output directory

Calling sqoop with hive import is not working multiple times due to  kept output directory
------------------------------------------------------------------------------------------

                 Key: SQOOP-443
                 URL: https://issues.apache.org/jira/browse/SQOOP-443
             Project: Sqoop
          Issue Type: Improvement
    Affects Versions: 1.4.0-incubating, 1.4.1-incubating
            Reporter: Jarek Jarcec Cecho
            Assignee: Jarek Jarcec Cecho
            Priority: Minor


Hive is not removing input directory when doing "LOAD DATA" command in all cases. This input directory is actually sqoop's export directory. Because this directory is kept, calling same sqoop command twice is failing on exception "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory $table already exists".

This issue might be easily overcome by manual directory removal, however it's putting unnecessary burden on users. It's also complicating executing saved jobs as there is additional script execution needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (SQOOP-443) Calling sqoop with hive import is not working multiple times due to kept output directory

Posted by "Jarek Jarcec Cecho (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SQOOP-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho updated SQOOP-443:
-------------------------------------

    Attachment: SQOOP-443.patch
    
> Calling sqoop with hive import is not working multiple times due to  kept output directory
> ------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-443
>                 URL: https://issues.apache.org/jira/browse/SQOOP-443
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.4.0-incubating, 1.4.1-incubating
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>            Priority: Minor
>         Attachments: SQOOP-443.patch
>
>
> Hive is not removing input directory when doing "LOAD DATA" command in all cases. This input directory is actually sqoop's export directory. Because this directory is kept, calling same sqoop command twice is failing on exception "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory $table already exists".
> This issue might be easily overcome by manual directory removal, however it's putting unnecessary burden on users. It's also complicating executing saved jobs as there is additional script execution needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SQOOP-443) Calling sqoop with hive import is not working multiple times due to kept output directory

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SQOOP-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211120#comment-13211120 ] 

jiraposter@reviews.apache.org commented on SQOOP-443:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3952/
-----------------------------------------------------------

Review request for Sqoop, Arvind Prabhakar and Bilung Lee.


Summary
-------

I've added code that is removing export directory in case that it's empty.


This addresses bug SQOOP-443.
    https://issues.apache.org/jira/browse/SQOOP-443


Diffs
-----

  /src/java/org/apache/sqoop/hive/HiveImport.java 1245157 

Diff: https://reviews.apache.org/r/3952/diff


Testing
-------

ant -Dhadoopversion={20, 23, 100} test
real testing environment based on CDH3


Thanks,

Jarek


                
> Calling sqoop with hive import is not working multiple times due to  kept output directory
> ------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-443
>                 URL: https://issues.apache.org/jira/browse/SQOOP-443
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.4.0-incubating, 1.4.1-incubating
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>            Priority: Minor
>
> Hive is not removing input directory when doing "LOAD DATA" command in all cases. This input directory is actually sqoop's export directory. Because this directory is kept, calling same sqoop command twice is failing on exception "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory $table already exists".
> This issue might be easily overcome by manual directory removal, however it's putting unnecessary burden on users. It's also complicating executing saved jobs as there is additional script execution needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (SQOOP-443) Calling sqoop with hive import is not working multiple times due to kept output directory

Posted by "Jarek Jarcec Cecho (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SQOOP-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho updated SQOOP-443:
-------------------------------------

    Status: Patch Available  (was: Open)
    
> Calling sqoop with hive import is not working multiple times due to  kept output directory
> ------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-443
>                 URL: https://issues.apache.org/jira/browse/SQOOP-443
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.4.0-incubating, 1.4.1-incubating
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>            Priority: Minor
>         Attachments: SQOOP-443.patch
>
>
> Hive is not removing input directory when doing "LOAD DATA" command in all cases. This input directory is actually sqoop's export directory. Because this directory is kept, calling same sqoop command twice is failing on exception "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory $table already exists".
> This issue might be easily overcome by manual directory removal, however it's putting unnecessary burden on users. It's also complicating executing saved jobs as there is additional script execution needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira