You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Leonid Furman (JIRA)" <ji...@apache.org> on 2009/12/29 01:42:29 UTC
[jira] Created: (MAPREDUCE-1339) Sqoop full table import job times
out when using the split-by attribute
Sqoop full table import job times out when using the split-by attribute
-----------------------------------------------------------------------
Key: MAPREDUCE-1339
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1339
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Priority: Critical
Fix For: 0.22.0
Problem
------------
When running sqoop command for full table import with split-by attribute specified, as follows:
sqoop --connect CONNECT_STRING --username USER_NAME --password PASSWORD --table TABLE_NAME --fields-terminated-by \\0x01 --as-textfile --warehouse-dir OUTPUT_DIR split-by RECORD_ID
Sqoop is going to transform the split-by attribute to ORDER BY clause and run the following query in SQL (say, Oracle):
SELECT * FROM TABLE_NAME ORDER BY RECORD_ID
If the table has, for example, 20 million records, the ORDER BY part will increase the query running significantly, eventually causing time out, and resulting in no output written to Hadoop file system.
Proposed solution
-------------------------
Not to append the ORDER_BY clause to SQL query if no where clause is specified.
Can there be any issues with this solution?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1339) Sqoop full table import job times
out when using the split-by attribute
Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Kimball resolved MAPREDUCE-1339.
--------------------------------------
Resolution: Duplicate
Sqoop has been removed from MapReduce; closing this issue. Also, Oracle functionality has been improved in the mean time so as to obviate this bug.
> Sqoop full table import job times out when using the split-by attribute
> -----------------------------------------------------------------------
>
> Key: MAPREDUCE-1339
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1339
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/sqoop
> Affects Versions: 0.22.0
> Reporter: Leonid Furman
> Priority: Critical
> Fix For: 0.22.0
>
>
> Problem
> ------------
> When running sqoop command for full table import with split-by attribute specified, as follows:
> sqoop --connect CONNECT_STRING --username USER_NAME --password PASSWORD --table TABLE_NAME --fields-terminated-by \\0x01 --as-textfile --warehouse-dir OUTPUT_DIR split-by RECORD_ID
> Sqoop is going to transform the split-by attribute to ORDER BY clause and run the following query in SQL (say, Oracle):
> SELECT * FROM TABLE_NAME ORDER BY RECORD_ID
> If the table has, for example, 20 million records, the ORDER BY part will increase the query running significantly, eventually causing time out, and resulting in no output written to Hadoop file system.
> Proposed solution
> -------------------------
> Not to append the ORDER_BY clause to SQL query if no where clause is specified.
> Can there be any issues with this solution?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.