You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "David Randolph (JIRA)" <ji...@apache.org> on 2012/05/15 23:27:07 UTC

[jira] [Comment Edited] (SQOOP-485) --hive-import cannot be used with --query

    [ https://issues.apache.org/jira/browse/SQOOP-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276228#comment-13276228 ] 

David Randolph edited comment on SQOOP-485 at 5/15/12 9:25 PM:
---------------------------------------------------------------

If sqoop supports a direct import to Hive, I think the user should be able to specify an SQL query to define what data is imported. You can call this a missing feature instead of a bug if you prefer, but it seems like core functionality to me.

Your suggested workaround would not provide me with a functioning Hive warehouse. The complete workaround, I think, looks something like this:

# sqoop create-hive-table --connect jdbc:oracle:thin:@//oss-devdb2.pcs.mot.com/TOOLS --table FACT --username HADOOP --password $pass
# sqoop import --verbose --connect jdbc:oracle:thin:@//oss-devdb2.pcs.mot.com/TOOLS --username HADOOP --password $pass --query 'select * from fact where day = 5 and $CONDITIONS' --fields-terminated-by , --escaped-by \\ -m1 --target-dir /user/me/fact --append
# hadoop fs -mv /user/me/fact/part-m-00000 /user/me/fact/day5
# hive -e "load data inpath '/user/me/fact/day5' into table fact"

This will move the files from the target-dir to /user/hive/warehouse/fact, and these data will be usable in hive.

Thanks,
Dave
                
      was (Author: rndlph):
    If sqoop supports a direct import to Hive, I think you should be able to specify an SQL query to define what data is imported. You can call this a missing feature instead of a bug if you like, but it seems like core functionality to me.

Your suggested workaround would not provide me with a functioning Hive warehouse. The complete workaround, I think, looks something like this:

# sqoop create-hive-table --connect jdbc:oracle:thin:@//oss-devdb2.pcs.mot.com/TOOLS --table FACT --username HADOOP --password $pass
# sqoop import --verbose --connect jdbc:oracle:thin:@//oss-devdb2.pcs.mot.com/TOOLS --username HADOOP --password $pass --query 'select * from fact where day = 5 and $CONDITIONS' --fields-terminated-by , --escaped-by \\ -m1 --target-dir /user/me/fact --append
# hadoop fs -mv /user/me/fact/part-m-00000 /user/me/fact/day5
# hive -e "load data inpath '/user/me/fact/day5' into table fact"

This will move the files from the target-dir to /user/hive/warehouse/fact, and these data will be usable in hive.

Thanks,
Dave
                  
> --hive-import cannot be used with --query
> -----------------------------------------
>
>                 Key: SQOOP-485
>                 URL: https://issues.apache.org/jira/browse/SQOOP-485
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.3.0
>         Environment: :/home/w17626> uname -a
> Linux il93rhel91 2.6.18-128.2.1.el5 #1 SMP Wed Jul 8 11:54:47 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: David Randolph
>
> sqoop import --verbose --connect jdbc:oracle:thin:@//oss-devdb2.pcs.mot.com/TOOLS --username HADOOP --password xxx --query 'select * from fact where id < 11 and $CONDITIONS' --hive-import --hive-table fact --fields-terminated-by , --escaped-by \\ -m1 --append
> produces the following output:
> {noformat}
> sqoop import --verbose --connect jdbc:oracle:thin:@//oss-devdb2.pcs.mot.com/TOOLS --username HADOOP --password xxx --query 'select * from fact where id < 11 and $CONDITIONS' --hive-import --hive-table fact --fields-terminated-by , --escaped-by \\ -m1 --append
> 12/05/15 11:10:10 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> 12/05/15 11:10:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
> 12/05/15 11:10:10 DEBUG sqoop.Sqoop: Must specify destination with --target-dir.
> Try --help for usage instructions.
> Must specify destination with --target-dir.
> Try --help for usage instructions.
>         at com.cloudera.sqoop.tool.ImportTool.validateImportOptions(ImportTool.java:823)
>         at com.cloudera.sqoop.tool.ImportTool.validateOptions(ImportTool.java:890)
>         at com.cloudera.sqoop.Sqoop.run(Sqoop.java:132)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182)
>         at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221)
>         at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230)
>         at com.cloudera.sqoop.Sqoop.main(Sqoop.java:239)
> Must specify destination with --target-dir.
> Try --help for usage instructions.
> <snip>
> {noformat}
> I can work around this immediate failure by specifying the --target-dir flag. This seems to do the right thing, but the sqoop command hangs. Also, use of --target-dir with --hive-import is discouraged in SQOOP-464.
> How can I use the --query flag to import into Hive?
> :> sqoop version
> Sqoop 1.3.0-cdh3u3
> git commit id 57cbc8d38cc5ff22a24d34c3d13f9862fd1372bb
> Compiled by jenkins@ubuntu-slave02 on Thu Jan 26 10:41:26 PST 2012
> Thanks,
> Dave

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira