You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Brett Medalen (JIRA)" <ji...@apache.org> on 2015/04/29 15:29:06 UTC
[jira] [Issue Comment Deleted] (SQOOP-1277) Import not splitted
when using --boundary-query
[ https://issues.apache.org/jira/browse/SQOOP-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brett Medalen updated SQOOP-1277:
---------------------------------
Comment: was deleted
(was: I had a similar problem and came across this JIRA. I put the boundary-query in single quotes and I didn't get the $CONDITIONS error any more.
Take a look at the Sqoop User Guide about single vs. double quotes: http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports
While the user guide is talking about the -query switch, it also appears to apply to the -boundary-query switch as well.)
> Import not splitted when using --boundary-query
> -----------------------------------------------
>
> Key: SQOOP-1277
> URL: https://issues.apache.org/jira/browse/SQOOP-1277
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.4.4
> Environment: Amazon AWS
> Reporter: Porati Sébastien
> Assignee: Gwen Shapira
> Priority: Critical
>
> I try to import Mysql Data into a hive table. I would like to use a custom boundary query. Results : sqoop does not split the load into multiple query and the import takes too long time.
> My creation command :
> {code:none}
> sqoop job -Dsqoop.metastore.client.record.password=true \
> --create importJobName -- import \
> --connect jdbc:mysql://some_jdbc_pram \
> --username user_name \
> --password MyPassword \
> --table my_table \
> --columns "collect_id,collected_data_id,value" \
> --boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name'" \
> --split-by column_name \
> --num-mappers X \
> --hive-import \
> --hive-overwrite \
> --hive-table hivedb.hibetable --as-textfile --null-string \\\\N --null-non-string \\\\N
> {code}
>
> The following message is displayed :
> {code:none}
> WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name'; splits may not partition data.
> {code}
> I tried to add the $CONDITION to the creation command
> {code:none}
> --boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name' AND \$CONDITION" \
> {code}
> But the job execution failed:
> {code:none}
> INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name' AND $CONDITIONS
> INFO mapred.JobClient: Cleaning up the staging area hdfs://10.34.140.108:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201401311408_0025
> ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Unknown column '$CONDITIONS' in 'where clause'
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)