You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2012/05/25 08:01:25 UTC

[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

    [ https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283145#comment-13283145 ] 

Cheolsoo Park commented on SQOOP-489:
-------------------------------------

Thanks for filing a jira Kate.

To fix the issue, I suggest that we exclude the column that is specified by partition key from the column definitions if any partition key is specified.

i.e.
{code}
CREATE TABLE IF NOT EXISTS `FOO` (`I` STRING, `J` STRING) PARTITIONED BY (I STRING)
{code}

=>

{code}
CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)
{code}

I also thought about adding the --columns option to hive-create-table, but this seems less preferable because this will force the user to specify all the columns except partition key in the command to import table. 

Please let me know if anyone has a better suggestion.

Another note that I'd like to add is that in my experiment, the current query sometimes works in Hive while it sometimes doesn't. So you might find it work for you. Nevertheless, I confirmed with a Hive developer that this should be a syntax error, so we should not rely on this Hive behavior.
                
> Cannot define partition keys for Hive tables created through Sqoop
> ------------------------------------------------------------------
>
>                 Key: SQOOP-489
>                 URL: https://issues.apache.org/jira/browse/SQOOP-489
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.1-incubating
>            Reporter: Kathleen Ting
>
> By enabling the table option, Sqoop includes every column in the table in the create table query, and by enabling the hive-partition-key option, Sqoop blindly appends the "partitioned by" clause. Now if you specify one of columns in the table in the hive-partition-key, this will cause a syntax error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira