You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "mingran wang (JIRA)" <ji...@apache.org> on 2010/02/04 00:13:27 UTC
[jira] Created: (MAPREDUCE-1449) Sqoop Documentation about
--split-by column has to be unique key seems to be wrong
Sqoop Documentation about --split-by column has to be unique key seems to be wrong
----------------------------------------------------------------------------------
Key: MAPREDUCE-1449
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1449
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/sqoop
Affects Versions: 0.20.1
Reporter: mingran wang
http://archive.cloudera.com/docs/sqoo...
The document above shows that " To guarantee correctness of your input, you must select an ordering column for which each row has a unique value. If duplicate values appear in the ordering column, the results of the import are undefined, and Sqoop will not be able to detect the error."
I read the source code for sqoop, it seems that the column to split by doesn't have to be a unique key. Plus, when the primary key is a composite key, the sqoop code only takes the first column of the composite key which in most cases is not unique key anyways.
I also checked the output when non-unique key is used to split, there is nothing wrong with the result.
I am wondering if the document is wrong, or there is some hidden trickiness that I am not aware of.
I am using sqoop 20.1.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1449) Sqoop Documentation about
--split-by column has to be unique key seems to be wrong
Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Kimball resolved MAPREDUCE-1449.
--------------------------------------
Resolution: Won't Fix
Sqoop has been removed from MapReduce; issue moved to http://github.com/cloudera/sqoop/issues#issue/2
> Sqoop Documentation about --split-by column has to be unique key seems to be wrong
> ----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1449
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1449
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: contrib/sqoop
> Affects Versions: 0.20.1
> Reporter: mingran wang
>
> http://archive.cloudera.com/docs/sqoo...
> The document above shows that " To guarantee correctness of your input, you must select an ordering column for which each row has a unique value. If duplicate values appear in the ordering column, the results of the import are undefined, and Sqoop will not be able to detect the error."
> I read the source code for sqoop, it seems that the column to split by doesn't have to be a unique key. Plus, when the primary key is a composite key, the sqoop code only takes the first column of the composite key which in most cases is not unique key anyways.
> I also checked the output when non-unique key is used to split, there is nothing wrong with the result.
> I am wondering if the document is wrong, or there is some hidden trickiness that I am not aware of.
> I am using sqoop 20.1.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.